2,729 Matching Annotations
  1. Jun 2024
    1. Author response:

      eLife assessment

      This study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. However, the evidence remains incomplete as the methods employed do not rigorously establish a key aspect of the mechanism, fully address some alternative models, or sufficiently relate to prior results. Overall, this is a valuable advance for the field that could be improved to establish a more robust paradigm.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Early-efficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about one-quarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing. While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling.

      While it is true that both transcription and passive replication can cause the signal of MCM-ChEC to disappear, neither can cause selective disappearance of the displaced complex without affecting the non-displaced complex.  Indeed, in the case of transcription, RNA polymerase transcribing C-pro would have to first dislodge the normally positioned MCM complex before even reaching the displaced complex.  Furthermore, deletion of FUN30 leads to both more C-pro transcription and less disappearance of the displaced MCM complex.  It is important to keep in mind that this cannot somehow reflect continuous replenishment of displaced MCMs with newly loaded MCMs, since the cells are in S phase and licensing is restricted to G1. 

      Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results.

      Copy number reduction of the magnitude caused by deletion of SIR2 and FUN30 does not suppress the sir2D effect (i.e. early replication of the rDNA), but rather exacerbates it.  In particular, deletion of SIR2 and FUN30 causes the rDNA to shrink to approximately 35 copies.  Kwan et al., 2023 (PMID: 36842087) have shown that reduction of rDNA copy number to 35 causes a dramatic acceleration of rDNA replication in a SIR2 strain.  Thus, the effect of rDNA size on replication timing reinforces our conclusion that deletion of FUN30 suppresses rDNA replication.

      However, to address this concern directly, in the revision we will include 2 D gels in fob1 strains with equal number of repeats that allows to conclude that the effect of FUN30 deletion in suppressing rDNA origin firing is independent of either rDNA size or FOB1. The figure of the critical 2 D gels is shown below in the reply to reviewer 2.

      Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims.

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model.

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      The two potential initiation sites that one would monitor (non-displaced and displaced) are separated by less than 150 base pairs, and other techniques simply do not have the resolution necessary to distinguish such differences.  Furthermore, as we suggest in the manuscript, our results are consistent with a model in which it is only the displaced MCM complex that is activated, whether in sir2 or WT.  If no genotype-dependent difference in initiation sites is even expected, it would be hard to interpret even the most precise replication-based assays.  However, the reviewer is correct that this is a novel technique and that confirmation with a well-established technique is comforting, therefore we are performing ChIP experiments to corroborate, to the extent possible, the conclusions that we reached with ChEC. 

      We appreciate the reviewer pointing out that some statistical analyses were lacking, and we will correct this in a revised manuscript.

      Additional background and discussion for public review:

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance.

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion.

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A?

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30.

      Strengths:

      The paper provides new information on the role of a conserved chromatin remodeling protein in the regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading.

      Weaknesses:

      The relationship between the author's results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      Strains lacking SIR2 have unstable rDNA size, and FOB1 deletion stabilizes rDNA size in sir2 background. Likewise, FOB1 deletion influences the kinetics  rDNA size reduction in sir2 fun30 cells. However, the main effect of Fun30 in sir2 cells we were interested in, suppression of rDNA replication, is preserved in fob1 background, arguing that the observed effect is independent of Fob1 (see figure below). Given that the main focus of the paper is regulation of rDNA origins activity and that these changes were independent of Fob1, we had elected not to include these results in the original manuscript but will gladly include them in the revision.

      Besides refuting the possible role of Fob1 in the FUN30-mediated activation of rDNA origin firing in sir2 cells, the use of fob1 background enabled us compare the activation of rDNA origins in the sir2 and sir2 fun30 strains with equally short rDNA size. The 2-D gels demonstrate a dramatic suppression of rDNA origin activity upon deletion of FUN30 in the sir2 fob1 strains with 35 rDNA copies.

      Author response image 1.

      The deletion of FUN30 diminishes the replication bubble signal in a fob1 sir2 strain with 35 rDNA copies by more than tenfold. The single rARS signal, marked with the arrow, originates from the rightmost rDNA repeat. This specific rightmost rDNA NheI fragment is approximately 25 kb in size, distinctly larger than the 4.7 kb NheI 1N rARS-containing fragments that originate from the internal rDNA repeats.

      Reviewer #3 (Public Review):

      Summary:

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc5 had not effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA.

      The reason that the results for the fun30 single mutant appear incongruent, with a larger signal of the +2 nucleosome in the MNase-seq plot but a negligible signal in the ChEC-seq plot is the paucity of displaced Mcm in the fun30 single mutant. Given the relative absence of displaced MCMs, the MCM-MNase fusion protein can't "light up" the +2 nucleosome.  We will comment on this in the revision to clarify this. 

      Strengths

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position.

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells.

      Weaknesses

      (1) It is unclear which strains were used in each experiment.

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear.

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description.

      We appreciate the reviewer pointing out places in which our manuscript omitted key pieces of information (items 1 and 3), and we will fix these oversights in our revision. 

      With regard to point 2, we had written: 

      “Fun30 is also known to play a role in the DNA damage response; specifically, phosphorylation of Fun30 on S20 and S28 by CDK1 targets Fun30 to sites of DNA damage, where it promotes DNA resection (Chen et al. 2016; Bantele et al. 2017). To determine whether the replication phenotype that we observed might be a consequence of Fun30's role in the DNA damage response, we tested non-phosphorylatable mutants for the ability to suppress early replication of the rDNA in sir2; these mutations had no effect on the replication phenotype (Figure 2B), arguing against a primary role for Fun30

      in DNA damage repair that somehow manifests itself in replication.”

      We will expand on this to clarify our point in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors report that optogenetic inhibition of hippocampal axon terminals in retrosplenial cortex impairs the performance of a delayed non-match to place task. The significance of findings elucidating the role of hippocampal projections to the retrosplenial cortex in memory and decision-making behaviors is important. However, the strength of evidence for the paper's claims is currently incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a study on the role of the retrosplenial cortex (RSC) and the hippocampus in working memory. Working memory is a critical cognitive function that allows temporary retention of information for task execution. The RSC, which is functionally and anatomically connected to both primary sensory (especially visual) and higher cognitive areas, plays a key role in integrating spatial-temporal context and in goal-directed behaviors. However, the specific contributions of the RSC and the hippocampus in working memory-guided behaviors are not fully understood due to a lack of studies that experimentally disrupt the connection between these two regions during such behaviors.

      In this study, researchers employed eArch3.0 to silence hippocampal axon terminals in the RSC, aiming to explore the roles of these brain regions in working memory. Experiments were conducted where animals with silenced hippocampal axon terminals in the RSC performed a delayed non-match to place (DNMP) task. The results indicated that this manipulation impaired memory retrieval, leading to decreased performance and quicker decision-making in the animals. Notably, the authors observed that the effects of this impairment persisted beyond the light-activation period of the opsin, affecting up to three subsequent trials. They suggest that disrupting the hippocampal-RSC connection has a significant and lasting impact on working memory performance.

      Strengths:

      They conducted a study exploring the impact of direct hippocampal inputs into the RSC, a region involved in encoding spatial-temporal context and transferring contextual information, on spatial working memory tasks. Utilizing eArch3.0 expressed in hippocampal neurons via the viral vector AAV5-hSyn1-eArch3.0, they aimed to bilaterally silence hippocampal terminals located at the RSC in rats pre-trained in a DNMP task. They discovered that silencing hippocampal terminals in the RSC significantly decreased working memory performance in eArch+ animals, especially during task interleaving sessions (TI) that alternated between trials with and without light delivery. This effect persisted even in non-illuminated trials, indicating a lasting impact beyond the periods of direct manipulation. Additionally, they observed a decreased likelihood of correct responses following TI trials and an increased error rate in eArch+ animals, even after incorrect responses, suggesting an impairment in error-corrective behavior. This contrasted with baseline sessions where no light was delivered, and both eArch+ and control animals showed low error rates.

      Weaknesses:

      While I agree with the authors that the role of hippocampal inputs to the RSC in spatial working memory is understudied and merits further investigation, I find that the optogenetic experiment, a core part of this manuscript that includes viral injections, could be improved. The effects were rather subtle, rendering some of the results barely significant and possibly too weak to support major conclusions.

      We thank Reviewer#1 for carefully and critically reading our manuscript, and for the valuable comments provided. The judged “subtlety” of the effects stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition. We disagree with this perspective and find it rather reductive for several reasons.

      Once seen in the context of the animal’s ecology, subtle impairments can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source.

      Also, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as that of “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of the perturbed factor’s role. If a caricatural analogy is allowed, it would be as if we were to study the role of an animal’s legs by chopping them both off and observing the resulting behaviour.

      In our study we conclude that silencing HIPP inputs in RSC perturbs cognition enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and for our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      Additionally, no mechanistic investigation was conducted beyond referencing previous reports to interpret the core behavioral phenotypes.

      We fully agree with this being a weakness, as we wish we could have done more mechanistic studies to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and perhaps in the future dissect its circuit determinants. We have all these goals very present and hope we can address them soon.

      Reviewer #2 (Public Review):

      The authors examine the impact of optogenetic inhibition of hippocampal axon terminals in the retrosplenial cortex (RSP) during the performance of a working memory T-maze task. Performance on a delayed non-match-to-place task was impaired by such inhibition. The authors also report that inhibition is associated with faster decision-making and that the effects of inhibition can be observed over several subsequent trials. The work seems reasonably well done and the role of hippocampal projections to retrosplenial cortex in memory and decision-making is very relevant to multiple fields. However, the work should be expanded in several ways before one can make firm conclusions on the role of this projection in memory and behavior.

      We thank Reviewer#2 for carefully and critically reading our manuscript, and for the valuable comments provided.

      (1) The work is very singular in its message and the experimentation. Further, the impact of the inhibition on behaviour is very moderate. In this sense, the results do not support the conclusion that the hippocampal projection to retrosplenial cortex is key to working memory in a navigational setting.

      As we have mentioned in response to Reviewer#1, the judged “very moderate” effect stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition, precluding its consideration as “key” for behaviour. We disagree with this perspective and find it rather reductive for several reasons. Once seen in the context of the animal’s ecology, quantitatively lower impairments in working memory are no less key for this cognitive capacity, and can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source. Furthermore, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of its role.

      In our study we conclude that silencing HIPP inputs in RSC perturbs behaviour enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      (2) There are no experiments examining other types of behavior or working memory. Given that the animals used in the studies could be put through a large number of different tasks, this is surprising. There is no control navigational task. There is no working memory test that is non-spatial. Such results should be presented in order to put the main finding in context.

      It is hard to gainsay this point. The more thorough and complete a behavioural characterization is, the more informative is the study, from every angle you look at it. While we agree that other forms of WM would be quite interesting in this context, we also cannot ignore the fact that DNMP is widely tested as a WM task, one that is biologically plausible, sensitive to perturbations of neural circuitry know to be at play therein, and fully accepted in the field. Faced with the impossibility of running further studies, for lack of additional funding and human resources, we chose to run this task.

      A control navigational task would, in our understanding, be used to assess whether silencing HIPP projections to RSC would affect (spatial?) navigation, rather than WM, thus explaining the observed impairment. To this we have the following to say: Spatial Navigation is a very basic cognitive function, one that relies on body orientation relative to spatial context, on keeping an updated representation of such spatial context, (“alas”, as memory), and on guiding behaviour according to acquired knowledge about spatial context. Some of these functions are integral to spatial working memory, as such, they might indeed be affected.

      Dissecting the determinants of spatial WM is indeed an ongoing effort, one that was not the intention of the current study, but also one that we have very present, in hope we can address in the future.

      A non-spatial WM task would indeed vastly solidify our claims beyond spatial WM, onto WM. We have, for this reason, changed the title of the manuscript which now reads “spatial working memory”.

      (3) The actual impact of the inhibition on activity in RSP is not provided. While this may not be strictly necessary, it is relevant that the hippocampal projection to RSP includes, and is perhaps dominated by inhibitory inputs. I wonder why the authors chose to manipulate hippocampal inputs to RSP when the subiculum stands as a much stronger source of afferents to RSP and has been shown to exhibit spatial and directional tuning of activity. The points here are that we cannot be sure what the manipulation is really accomplishing in terms of inhibiting RSP activity (perhaps this explains the moderate impact on behavior) and that the effect of inhibiting hippocampal inputs is not an effective means by which to study how RSP is responsive to inputs that reflect environmental locations.

      We fully agree that neural recordings addressing the effect of silencing on RSC neural activity is relevant. We do wish we could have provided more mechanistic studies, to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and thus dissecting its circuit determinants. We have all these goals very present and hope we can address them soon. Subiculum, which we mention in the Introduction, is indeed a key player in this complex circuitry, one whose hypothetical influence is the subject of experimental studies which will certainly reveal many other key elements.

      (4) The impact of inhibition on trials subsequent to the trial during which optical stimulation was actually supplied seems trivial. The authors themselves point to evidence that activation of the hyperpolarizing proton pump is rather long-lasting in its action. Further, each sample-test trial pairing is independent of the prior or subsequent trials. This finding is presented as a major finding of the work, but would normally be relegated to supplemental data as an expected outcome given the dynamics of the pump when activated.

      We disagree that this finding is “trivial”, and object to the considerations of “normalcy”, which we are left wondering about.

      In lack of neurophysiological experiments (for the reasons stated above) to address this interesting finding, we chose to interpret it in light of (the few) published observations, such being the logical course of action in scientific reporting, given the present circumstances.

      Evidence for such a prolonged effect in the context of behaviour is scarce (to our knowledge only the one we cite in the manuscript). As such, it is highly relevant to report it, and give it the relevance we do in our manuscript, rather than “relegating it to supplementary data”, as the reviewer considers being “normal”.

      In the DNMP task the consecutive sample-test pairs are explicitly not independent, as they are part of the same behavioural session. This is illustrated by the simple phenomenon of learning, namely the intra-session learning curves, and the well-known behavioral trial-history effects. The brain does not simply erase such information during the ITI.

      (5) In the middle of the first paragraph of the discussion, the authors make reference to work showing RSP responses to "contextual information in egocentric and allocentric reference frames". The citations here are clearly deficient. How is the Nitzan 2020 paper at all relevant here?

      Nitzan 2020 reports the propagation of information from HIPP to CTX via SUB and RSC, thus providing a conduit for mnemonic information between the two structures, alternative to the one we target, thus providing thorough information concerning the HIPP-RSC circuitry at play during behaviour.

      Alexander and Nitz 2015 precisely cite the encoding, and conjunction, of two types of contextual information, internal (ego-) and external (allocentric).

      The subsequent reference is indeed superfluous here.

      We thank the Reviewer#2 for calling our attention to the fact that references for this information are inadequate and lacking. We have now cited (Gill et al., 2011; Miller et al., 2019; Vedder et al., 2017) and refer readers to the review (Alexander et al., 2023)  for the purpose of illustrating the encoding of information in the two reference frames. In addition, we have substantially edited the Introduction and Discussion sections, and suppressed unnecessary passages.

      (6) The manuscript is deficient in referencing and discussing data from the Smith laboratory that is similar. The discussion reads mainly like a repeat of the results section.

      Please see above. We thank Reviewer#2 for this comment, we have now re-written the Discussion such that it is less of a summary of the Results and more focused on their implications and future directions.

      Response to recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major

      Line 101: Even with the tapered lambda fibre optic stub, if the fibre optics were longitudinally staggered by 2 millimetres, they would deliver light to diagonal regions in the horizontal plane rather than covering the full length of the RSC. Is this staggering pattern randomized or fixed? Additionally, Figure 1C is a bit misleading, as the light distribution pattern from the tapered fibre optic is likely to be more concentrated near the surface of the fibre, rather than spreading widely in a large spherical pattern.

      The staggering is fixed. The elliptical (not spherical) contour in Fig 1C is not meant to convey any quantitative information, but rather to visually orient the reader towards the directions into which light will likely propagate, the effects of which we do not attempt to estimate here. We have made this contour smaller.

      Line 119: The authors demonstrate the viral expression pattern of a representative animal and the overall expression patterns of all other animals in Figure 1 and the Supplementary Figures. However, numerous cases in the Supplementary Figures exhibit viral leakages and strong expressions in adjacent cortical and thalamic areas. Although there is a magnified view of the RSC's expression pattern in Figure 1, authors should show the same way in the supplemental data as well. Additionally, the degree of viral expression in the hippocampal subregions varies substantially across animals. This variation is concerning and impacts the interpretation of the results.

      The viral construct was injected in the HIPP at coordinates based on our previous work (Ferreira-Fernandes et al., 2019) wherein injections of a similar vector in mid-dorsal HIPP resulted in widespread expression throughout the medial mesocortex AP extent, RSC through CG, as well as other areas in which HIPP establishes synapses. These were studied in detail then, by estimating the density of axon terminals. In the present work we did not acquire high-mag images of all slices, since they were too expensive, and we had this information from the study above. Still, we have now added further examples of high-mag images taken from eArch and CTRL animals.

      We believe it is important here to mention the fact that the virus we use, AAV5, only travels anterograde and is static (i.e. it does not travel transynaptically).

      Variations in viral expression are to be expected even if injections happen in the exact same way. It is crucial then, that fibre positioning is constant across animals, to guarantee that its relationship with viral expression is thence consistent, and to render irrelevant whatever off-target expression of the viral construct. We have ascertained this condition post-mortem in all our animals.

      Line 124: Another point regarding the viral expressions and optical fibre implants used to inhibit the HIPP-RSC pathway is that the RSC and HIPP extend substantially along the anterior-posterior axis. The authors should demonstrate how the viral expression is distributed along this axis and indicate where the tip of the tapered optical fibre ended by marking it in the histological images. This information is crucial to confirm the authors' claim that the hippocampal projection terminals were indeed modulated by optical light. Also, the manuscript would benefit from details about the power/duration and/or modulation of the light used.

      In both Figures 1 and S1 panels we can clearly see the tracks formed by the fibres. This provides examples of such dual angle placement vis a vis the expression of the construct, demonstrating that the former is fully targeted towards the latter. We have added markers to highlight these tracks and an example of a “full” track in figure S1. We did not have animals deviating from this relative positioning to any significant extent. The methods section mentions illumination power as 240mA, and we have now added estimated illumination time as well.

      Line 141: The authors should include data on task performance during learning and baseline sessions for each animal, to demonstrate that they fully grasped the task rules and that achieving a 75% performance ratio was sufficient.

      DNMP is a standard WM task used for many decades, in which animals reach performances above 75% in 4-8 sessions. We have used it extensively, and never saw any deviations from this learning rate and curve. We ran daily sessions until animals reached 75%, and thereafter until they maintained this performance, or above, for three consecutive sessions (the data points we show). We saw no deviations from what is published, nor from what is our own extensive experience, and thence are fully confident that all animals included in this manuscript grasped task rules.

      Line 146: While the study focused on inhibiting inputs during the test run (retrieval phase), it would be beneficial to also inhibit inputs during the sample run (encoding phase) and the delay period. This would help confirm whether the silencing affects only working memory retrieval, or if it also impacts encoding and maintenance.

      We agree, it would be very interesting to determine if there are any effects of silencing HIPP RSC terminals during Sample. However, since there is a limit to the number of trials per session, and to the total number of sessions, we could not run the three manipulations within each session of our experimental design, as that would lower the number of trials per condition to an extent that would affect statistical power. Silencing HIPP RSC terminals during Sample would best be a separate experiment, asking a different question, and perhaps within an experimental design distinct from the one envisioned.

      A very important point here relates to the fact that the effects of optogenetic manipulation do not limit themselves to the illumination epoch, in fact they extend far beyond onto the 3rd trial post-illumination. The insertion of Sample-illuminated trials interleaved in the same session would fundamentally affect the interpretation of experimental results, as we could not attribute lower performances to the effects in either or both manipulated epochs.

      Line 225: Figure 5 illustrates that silencing the inputs results in an extended impairment of working memory performance. However, it's unclear if there are any behavioural changes during the sample run. The inhibition could potentially affect encoding in the subsequent sample run, considering the inter-trial interval (ITI) is only 20 seconds.

      From the observation of behaviour and the analysis of our data, we saw no overt “behavioural changes during the sample run”, as latencies and speeds were essentially unchanged.

      If what is meant by your comment is the effect of optogenetic manipulation being protracted from the Test towards the Sample epoch, we find this unlikely. Conservatively, we estimate the peak of our optogenetic manipulation to occur around the time light is delivered, the Test phase, rather than 20-30 secs later.

      In theory, any effect of optogenetic silencing of HIPP terminals in RSC can cause disturbances in encoding or Sample, the ITI itself, and the epoch in which mnemonic information retrieved from the Sample epoch is confronted with the contextual information present during Test, leading to a decision. This is regardless of the illumination epoch, and even if the effect of optogenetic manipulation is not prolonged in time. 

      Since in our experiments we specifically target the Test epoch, and there is, in all likelihood, a decaying magnitude of neurophysiological effects, manifest in the reported decaying nature of the manipulation mechanism, and in our observed decrease of behavioural proficiency from subsequent trials 1:4, we are convinced that a conservative interpretation is that our major effect is concentrated in the epoch in which we deliver light - the Test epoch, the consequences of which (possibly related to short term plasticity events taking place within the HIPP-RSC neural circuit) extending further in time.

      Line 410: The methods section on the surgical procedure could be clearer, particularly regarding the coordinates for microinjection and fibre implantation. A more precise description would aid reader comprehension.

      The now-reported injection and implantation coordinates include the numbers corresponding to the distances, in mm, from Bregma to the targets, in the three stereotaxic dimensions considered: antero-posterior, medial-lateral left and right, and dorso-ventral, as well as the angle at which the fibres were positioned. We have added labels to the figures to highlight the fibreoptic track locations. We will be happy to provide further details as deemed necessary.

      Line 461: It would be helpful to know if each animal displayed a preference for the left or right side. Including a description or figure showing that the performance ratio exceeded 75% in both left and right trials would provide a more comprehensive understanding of the animals' behaviour.

      In the DNMP, an extensively used and documented WM task, it is an absolute pre-condition that no animals are biased to either side. As such, we did not use any animal that showed such bias.<br /> We have not observed this to be the case in any of our candidate animals, nor would we use any animal exhibiting such a preference.

      Minor

      Line 25: In the INTRODUCTION section, the authors introduce ego-centric and allocentric variables in the RSC. However, if they intend to discuss this feature, there is no supporting data for ego-centric or allocentric variables in the Results section.

      We agree. The extent of the discussion of ego vs allo-centric variables in our manuscript might venture a bit out of the main subject. It was included to provide wider context to our reporting of the data, considering that spatial working memory is indeed one instance in which egocentric- and allocentric-referenced cognitive mechanisms confront each other, and one in which silencing the HIPP input to a cortical region thence involved would likely disturb ensuing computations. We have now substantially edited the manuscript’s Introduction and Discussion, sections, namely toning down this aspect.

      Line 125: In the section title, DNMT -> DNMP obviously.

      We have corrected this passage.

      Figures: The quality of the figure panels does not meet the expected standards. For example, scale bars are missing in many panels (e.g., Figure 1A bottom, 1B, 1C, S1), figure labels are misaligned (as seen in Figure 3A-B compared to 3C, same with Figure 5), and there is inconsistency in color schemes (e.g., Figure 3C versus Figure 6, where 'Error' versus 'Correct' is depicted using green versus blue, respectively).

      We have now corrected these inconsistencies and mistakes.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment

      This study presents an important finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants, hereby linking perception and action during implicit adaptation. The evidence supporting the claims of the authors is convincing. The normative approach of the proposed PEA model, which combines ideas from separate lines of research, including vision research and motor learning, opens avenues for future developments. This work will be of interest to researchers in sensory cue integration and motor learning.

      Thank you for the updated assessment. We are also grateful for the insightful and constructive comments from the reviewers, which have helped us improve the manuscript again. We made necessary changes following their comments (trimmed tests, new analysis results, etc) and responded to the comments in a point-by-point fashion below. We hope to publish these responses alongside the public review. Thank you again for fostering the fruitful discussion here.

      Public Reviews:

      Reviewer #1 (Public Review):

      I appreciate the normative approach of the PEA model and am eager to examine this model in the future. However, two minor issues remain:

      (1) Clarification on the PReMo Model:

      The authors state, "The PReMo model proposes that this drift comprises two phases: initial proprioceptive recalibration and subsequent visual recalibration." This description could misinterpret the intent of PReMo. According to PReMo, the time course of the reported hand position is merely a read-out of the *perceived hand position* (x_hat in your paper). Early in adaptation, the perceived hand position is biased by the visual cursor (x_hat in the direction of the cursor); towards the end, due to implicit adaptation, x_hat reduces to zero. This is the same as PEA. I recommend that the authors clarify PReMo's intent to avoid confusion.

      Note, however, the observed overshoot of 1 degree in the reported hand position. In the PReMo paper, we hypothesized that this effect is due to the recalibration of the perceived visual target location (inspired by studies showing that vision is also recalibrated by proprioception, but in the opposite direction). If the goal of implicit adaptation is to align the perceived hand position (x_hat) with the perceived target position (t_hat), then there would be an overshoot of x_hat over the actual target position.

      PEA posits a different account for the overshoot. It currently suggests that the reported hand position combines x_hat (which takes x_p as input) with x_p itself. What is reasoning underlying the *double occurrence* of x_p?

      There seem to be three alternatives that seem more plausible (and could lead to the same overshooting): 1) increasing x_p's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase), 2) decreasing sigma_p (assuming that participants pay more attention to the hand during the report phase), 3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration. All these options, at least to me, seem equally plausible and testable in the future.

      For clarification of the PReMo model’s take on Fig4A, we now write:

      “The PReMo model proposes that the initial negative drift reflects a misperceived hand location, which gradually reduces to zero, and the late positive drift reflects the influence of visual calibration of the target (Tsay, Kim, Saxena, et al., 2022). ”

      However, we would like to point out that the PEA model does not predict a zero (perceived hand location) even at the late phase of adaptation: it remains negative, though not as large as during initial adaptation (see Figure 4A, red line). Furthermore, we have not seen any plausible way to use a visually biased target to explain the overshoot of the judged hand location (see below when we address the three alternative hypotheses the reviewer raised).

      We don’t think the “double” use of xp is a problem, simply because there are TWO tasks under investigation when the proprioceptive changes are measured along with adaptation. The first is the reaching adaptation task itself: moving under the influence of the clamped cursor. This task is accompanied by a covert estimation of hand location after the movement (). Given the robustness of implicit adaptation, this estimation appears mandatory and automatic. The second task is the hand localization task, during which the subject is explicitly asked to judge where the hand is. Here, the perceived hand is based on the two available cues, one is the actual hand location xp, and the other is the influence from the just finished reaching movement (i.e., ). For Bayesian modeling from a normative perspective, sensory integration is based on the available cues to fulfill the task. For the second task of reporting the hand location, the two cues are xp and (with a possible effect of the visual target, which is unbiased since it is defined as 0 in model simulation; thus, its presence does not induce any shift effect). xp is used sequentially in this sense. Thus, its dual use is well justified.

      Our hypothesis is that the reported hand position results from a combination of from the previous movement and the current hand position xp. However, specifically for the overshoot of the judged hand location in the late part of the adaptation (Fig4A), the reviewer raised three alternative explanations by assuming that the PReMo model is correct. Under the PReMo model, the estimated hand location is only determined by , and xp is not used in the hand location report phase. In addition, (with xp used once) and a visual recalibration of the target can explain away the gradual shift from negative to positive (overshoot).

      We don’t think any of them can parsimoniously explain our findings here, and we go through these three hypotheses one by one:

      (1) increasing xp's contribution (assuming visual uncertainty increases when the visual cursor is absent during the hand report phase)

      (2) decreasing σp (assuming that participants pay more attention to the hand during the report phase)

      The first two alternative explanations basically assume that xp has a larger contribution (weighting in Bayesian terms) in the hand location report phase than in the adaptation movement phase, no matter due to an increase in visual uncertainty (alternative explanation 1) or a reduction in proprioceptive uncertainty (alternative explanation 2). Thus, we assume that the reviewer suggests that a larger weight for xp can explain why the perceived hand location changes gradually from negative to positive. However, per the PReMo model, a larger weight for the xp will only affect , which is already assumed to change from negative to zero. More weight in  in the hand report phase (compared to the adaptation movement phase) would not explain away the reported hand location from negative to positive. This is because no matter how much weight the xp has, the PReMo model assumes a saturation for the influence of xp on . Thus would not exceed zero in the late adaptation. Then, the PReMo model would rely on the so-called visual shift of the target to explain the overshoot. This leads us to the third alternative the reviewer raised:

      (3) it could be that the perceived target position undergoes recalibration in the opposite direction to proprioceptive recalibration.

      The PReMo model originally assumed that the perceived target location was biased in order to explain away the positive overshoot of the reported hand location. We assume that the reviewer suggests that the perceived target position, which is shifted to the positive direction, also “biases” the perceived hand position. We also assume that the reviewer suggests that the perceived hand location after a clamp trial () is zero, and somehow the shifted perceived target position “biases” the reported hand location after a clamp trial. Unfortunately, we did not see any mathematical formulation of this biasing effect in the original paper (Tsay, Kim, Haith, et al., 2022). We are not able to come up with any formulation of this hypothesized biasing effect based on Bayesian cue integration principles. Target and hand are two separate perceived items; how one relates to another needs justification from a normative perspective when discussing Bayesian models. Note this is not a problem for our PEA models, in which both cues used are about hand localization, one is and the other is xp.

      We believe that mathematically formulating the biasing effect (Figure 4A) is non-trivial since the reported hand location changes continuously from negative to positive. Thus, quantitative model predictions, like the ones our PEA model presents here, are needed.

      To rigorously test the possible effect of visual recalibration of the target, there are two things to do: 1) use the psychometric method to measure the biased perception of the target, and 2) re-do Tsay et al. 2020 experiment without the target. For 2), compared to the case with the target, the PEA model would predict a larger overshoot, while the PReMo would predict a smaller overshoot or even zero overshoot. This can be left for future studies.

      (2) Effect of Visual Uncertainty on Error Size:

      I appreciate the authors' response about methodological differences between the cursor cloud used in previous studies and the Gaussian blob used in the current study. However, it is still not clear to me how the authors reconcile previous studies showing that visual uncertainty reduced implicit adaptation for small but not large errors (Tsay et al, 2021; Makino, et al 2023) with the current findings, where visual uncertainty reduced implicit adaptation for large but not small errors.

      Could the authors connect the dots here: I could see that the cursor cloud increases potential overlap with the visual target when the visual error is small, resulting in intrinsic reward-like mechanisms (Kim et al, 2019), which could potentially explain attenuated implicit adaptation for small visual errors. However, why would implicit adaptation in response to large visual errors remain unaffected by the cursor cloud? Note that we did verify that sigma_v is increased in (Tsay et al. 2021), so it is unlikely due to the cloud simply failing as a manipulation of visual uncertainty.

      In addition, we also reasoned that testing individuals with low vision could offer a different test of visual uncertainty (Tsay et al, 2023). The advantage here is that both control and patients with low vision are provided with the same visual input-a single cursor. Our findings suggest that uncertainty due to low vision also shows reduced implicit adaptation in response to small but not large errors, contrary to the findings in the current paper. Missing in the manuscript is a discussion related to why the authors' current findings contradict those of previous results.

      For connecting the dots for two previous studies (Tsay et al., 2021, 2023); Note Makino et al., 2023 is not in this discussion since it investigated the weights of multiple cursors, as opposed to visual uncertainty associated with a cursor cloud):

      First, we want to re-emphasize that using the cursor cloud to manipulate visual uncertainty brings some confounds, making it not ideal for studying visuomotor adaptation. For example, in the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (σv) in our model), but it additionally affects the mean of the distribution (µ). This unnecessary confound is neatly avoided by using cursor blurring, which is still a cursor with its center (µ) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2020, the cursor cloud often overlaps with the visual target; this "target hit" would affect adaptation, possibly via a reward learning mechanism (Kim et al., 2019). This is a second confound that accompanies the cursor cloud. Yes, the cursor cloud was verified as associated with high visual uncertainty (Tsay et al., 2021); this verification was done with a psychophysics method with a clean background, not in the context of a hand reaching a target that is needed. Thus, despite the cursor cloud having a sizeable visual uncertainty, our criticisms for it still hold when used in error-clamp adaptation.

      Second, bearing these confounds of the cursor cloud in mind, we postulate one important factor that has not been considered in any models thus far that might underlie the lack of difference between the single-cursor clamp and the cloud-cursor clamp when the clamp size is large: the cursor cloud might be harder to ignore than a single cursor. For Bayesian sensory integration, the naive model is to consider the relative reliability of cues only. Yes, the cloud is more uncertain in terms of indicating the movement direction than a single cursor. However, given its large spread, it is probably harder to ignore during error-clamp movements. Note that ignoring the clamped cursor is the task instruction, but the large scatter of the cursor cloud is more salient and thus plausible and harder to ignore. This might increase the weighting of the visual cue despite its higher visual uncertainty. This extra confound is arguably minimized by using the blurred cursor as in our Exp4 since the blurred cursor did not increase the visual angle much (Figure 5D; blurred vs single cursor: 3.4mm vs 2.5mm in radius, 3.90o vs  2.87o in spread). In contrast, the visual angle of the dot cloud is at least a magnitude larger (cursor cloud vs. single cursor: at least 25o vs. 2.15o in the spread, given a 10o standard deviation of random sampling).

      Third, for the low-vision study (Tsay et al., 2023), the patients indeed show reduced implicit adaptation for a 3 o clamp (consistent with our PEA model) but an intact adaptation for 30-degree clamp (not consistent). Though this pattern appears similar to what happens for normal people whose visual uncertainty is upregulated by cursor cloud (Tsay et al., 2021), we are not completely convinced that the same underlying mechanism governs these two datasets. Low-vision patients indeed have higher visual uncertainty about color, brightness, and object location, but their visual uncertainty about visual motion is still unknown. Due to the difference in impairment among low vision people (e.g., peripheral or central affected) and the different roles of peripheral and central vision in movement planning and control (Sivak & Mackenzie, 1992), it is unclear about the overall effect of visual uncertainty in low vision people. The direction of cursor movement that matters for visuomotor rotation here is likely related to visual motion perception. Unfortunately, the original study did not measure this uncertainty in low-vision patients. We believe our Exp1 offers a valid method for this purpose for future studies. More importantly, we should not expect low-vision patients to integrate visual cues in the same way as normal people, given their long-term adaptation to their vision difficulties. Thus, we are conservative about interpreting the seemingly similar findings across the two studies (Tsay et al., 2021, 2023) as revealing the same mechanism.

      A side note: these two previous studies proposed a so-called mis-localization hypothesis, i.e., the cursor cloud was mislocated for small clamp size (given its overlapping with the target) but not for large clamp size. They suggested that the lack of uncertainty effect at small clamp sizes is due to mislocalization, while the lack of uncertainty effect at large clamp sizes is because implicit adaptation is not sensitive to uncertainty at large angles. Thus, these two studies admit that cursor cloud not only upregulates uncertainty but also generates an unwanted effect of so-called “mis-localization” (overlapping with the target). Interestingly, their hypothesis about less sensitivity to visual uncertainty for large clamps is not supported by a model or theory but merely a re-wording of the experiment results.

      In sum, our current study cannot offer an easy answer to "connect the dots" in the aforementioned two studies due to methodology issues and the specialty of the population. However, for resolving conflicting findings, our study suggests solutions include using a psychometric test to quantify visual uncertainty for cursor motion (Exp1), a better uncertainty-manipulation method to avoid a couple of confounds (Exp4, blurred cursor), and a falsifiable model. Future endeavors can solve the difference between studies based on the new insights from the current.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influence by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      The authors discussed in the revised version that the proposed model can capture the general implicit motor learning process in addition to the visuomotor rotation task. In the discussion, they emphasize two main principles: the automatic tracking of effector position and the combination of movement cues using Bayesian integration. These principles are suggested as key to understanding and modeling various motor adaptations and skill learning. The proposed model could potentially become a basis for creating new computational models for skill acquisition, especially where current models fall short.

      Weaknesses:

      The proposed model is described as elegant. In this paper, the authors test the model within a limited example condition, demonstrating its relevance to the sensorimotor adaptation mechanisms of the human brain. However, the scope of the model's applicability remains unclear. It has shown the capacity to explain prior data, thereby surpassing previous models that rely on elementary mathematics. To solidify its credibility in the field, the authors must gather more supporting evidence.

      Indeed, our model here is based on one particular experimental paradigm, i.e., the error-clamp adaptation. We used it simply because 1) this paradigm is one rare example that implicit motor learning can be isolated in a clean way, and 2) there are a few conflicting findings in the literature for us to explain away by using a unified model.

      For our model’s broad impact, we believe that as long as people need to locate their effectors during motor learning, the general principle laid out here will be applicable. In other words, repetitive movements with a Bayesian cue combination of movement-related cues can underlie the implicit process of various motor learning. To showcase its broad impact, in upcoming studies, we will extend this model to other motor learning paradigms, starting from motor adaptation paradigms that involve both explicit and implicit processes.

      Reviewer #3 (Public Review):

      (2.1) Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in a first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors are tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately compare their data with an unsuitable other data set (Tsay et al. 2020, instead of Tsay et al. 2021). Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation), than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular the model fits for experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      (2.2) Strengths

      In this study the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in a first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are: 1) a learning and 2) a retention rate, as used in popular state space models and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from, and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      (2.3) Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      For a proper citation of the “concave” adaptation function: we assume the reviewer is referring to the study by Morehead, 2017 which tested large clamp sizes up to 135 o and 175 o. Unsurprisingly, the 135 o and 175 o conditions lead to nearly zero adaptation, possibly due to the trivial fact that people cannot even see the moving cursor. We have quoted this seminar study from the very beginning. All other error-clamp studies with a block design emphasized an invariant or saturated implicit adaptation with large rotations (e.g., Kim, et al., 2019).

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between movement endpoint and disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      For the issues related to visual uncertainty measurement in Exp1:

      First, our visual uncertainty is about cursor motion direction in the display plane, and the measurement in Exp1 has never been done before. Thus, we do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work (Klein & Levi, 1987; Levi et al., 1987) in vision science since their studies showed that the deviation from the fixation is associated with an increase in visual uncertainty. Their study thus inspired us to conduct Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. Any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out-reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles. This is particularly important since we need to estimate parameters from one experiment to predict behaviors in another experiment.

      Second, the 1s delay of the reference cursor has minimal impact on the estimate of visual uncertainty based on previous vision studies. Our Exp1 used a similar visual paradigm by (White et al., 1992), which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6).

      These two problems have been addressed in the revised manuscript, with proper citations listed.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20° which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2°) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Reviewer 3 thinks Tsay 2020 dataset is not appropriate for our theorization, but we respectfully disagree. For the three points raised here, we would like to elaborate:

      (1) As we addressed in the previous response, the reported hand location in Figure 4A (Tsay et al., 2020) is not from a test of proprioceptive recalibration as conventionally defined. In the revision, we explicitly state that this dataset is not about proprioceptive recalibration and also delete texts that might mislead people to think so (see Results section). Instead, proprioceptive recalibration is measured by passive movement, as in our Exp3 (Figure 4E). For error-clamp adaptation here, "the remembered position of the target" is the target. Clearly, the participants did not report the target position, which is ever-present. Instead, their reported hand location shows an interestingly continuous change with ongoing adaptation.

      (2) Since the Tsay 2020 dataset is not a so-called proprioceptive recalibration, we need not take the difference between the reported location and the actual hand location. Indeed, the difference would be ~20 degrees, but comparing it to the previously reported proprioceptive recalibration is like comparing apples to oranges. In fact, throughout the paper, we refer to the results in Fig 4A as “reported hand location”, not proprioceptive recalibration. The target direction is defined as zero degree thus its presence will not bias the reported hand in the Bayesian cue combination (as this visual cue has a mean value of 0). Using the target as the reference also simplifies our modeling.

      (3) Exp3 is crucial for our study since it shows our model and its simple Bayesian cue combination principle are applicable not only to implicit adaptation but also to proprioceptive measures during adaptation. Furthermore, it reproduced the so-called proprioceptive recalibration and explained it away with the same Bayesian cue combination as the adaptation. We noticed that this field has accumulated an array of findings on proprioceptive changes induced by visuomotor adaptation. However, currently, there is a lack of a computational model to quantitatively explain them. Our study at least made an initial endeavor to model these changes.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real life situations.

      The largest caveat raised by the reviewer appears to be directed to the error-clamp paradigm in general, not only to our particular study. In essence, this paradigm indeed requires participants to ignore the clamped error; thus, its induced adaptive response can be attributed to implicit adaptation. The original paper that proposed this paradigm (Morehead et al., 2017) has been cited 220 times (According to Google Scholar, at the time of this writing, 06/2024), indicating that the field has viewed this paradigm in a favorable way.

      Furthermore, we agree that this kind of instruction and feedback (invariant clamp) differ from daily life experience, but it does not prevent us from gaining theoretical insights by studying human behaviors under this kind of "artificial" task setting. Thinking of the saccadic adaptation (Deubel, 1987; Kojima et al., 2004): jumping the target while the eye moves towards it, and this somewhat artificial manipulation again makes people adapt implicitly, and the adaptation itself is a "disastrous" strategy for real-life situations. However, scientists have gained an enormous understanding of motor adaptation using this seemingly counterproductive adaptation in real life. Also, think of perceptual learning of task-irrelevant stimuli (Seitz & Watanabe, 2005, 2009): when participants are required to learn to discriminate one type of visual stimuli, the background shows another type of stimuli, which people gradually learn even though they do not even notice its presence. This "implicit" learning can be detrimental to our real life, too, but the paradigm itself has advanced our understanding of the inner workings of the cognitive system.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      L101: There is a typo: (Tsay et al., 2020), 2020) should be corrected to (Tsay et al., 2020).

      Thanks for pointing it out, we corrected this typo.

      L224-228: It would be beneficial to evaluate the validity of the estimated sigma_u and sigma_p based on previous reports.

      We can roughly estimate σu by evaluating the variability of reaching angles during the baseline phase when no perturbation is applied. The standard deviation of the reaching angle in Exp 2 is 5.128o±0.190o, which is close to the σu estimated by the model (5.048o). We also used a separate perceptual experiment to test the proprioceptive uncertainty (n = 13, See Figure S6), σp from this experiment is 9.737o±5.598o, also close to the σp extracted by the model (11.119o). We added these new analysis results to the final version of the paper.

      L289-298: I found it difficult to understand the update equations of the proprioceptive calibration based on the PEA model. Providing references to the equations or better explanations would be helpful.

      We expanded the process of proprioceptive calibration in Supplementary Text 1 with step-by-step equations and more explanations. 

      Reviewer #3 (Recommendations For The Authors):

      Suggestions (or clarification of previous suggestions) for revisions

      The authors persist on using the Tsay et al 2020 paper despite its many drawbacks which the authors attempt to address in their reply. But the main drawback is that the results in the 2020 paper is NOT relative to the unseen hand but to the visual target the participants were supposed to move their hand to. If the results were converted so to be relative to the unseen hand, the localization biases would be over 20 deg in magnitude.

      The PEA simulations are plotted relative to the unseen hand which makes sense. If the authors want to persist using the Tsay 2020 dataset despite any issues, they at least need to make sure that the simulations are mimicking the same change. That is, the data from Tsay 2020 needs to be converted to the same variable used in the current paper.

      If the main objection for using the Tsay 2021 is that the design would lead to forgetting, we found that active localization (or any intervening active movements like no-cursor reach) does lead to some interference or forgetting (a small reduction in overall magnitude of adaptation) this is not the case for passive localization, see Ruttle et al, 2021 (data on osf). This was also just a suggestion, there may of course also be other, more suitable data sets.

      As stated above, changing the reference system is not necessary, nor does it affect our results. Tsay et al 2020 dataset is unique since it shows the gradual change of reported hand location along with error-clamp adaptation. The forgetting (or reduction in proprioceptive bias), even if it exists, would not affect the fitting quality of our model for the Tsay 2020 dataset: if we assume that forgetting is invariant over the adaptation process, the forgetting would only reduce the proprioceptive bias uniformly across trials. This can be accounted for by a smaller weight on . The critical fact is that the model can explain the gradual drift of the proprioceptive judgment of the hand location.

      By the way, Ruttle et al.'s 2021 dataset is not for error-clamp adaptation, and thus we will leave it to test our model extension in the future (after incorporating an explicit process in the model).

      References

      Deubel, H. (1987). Adaptivity of gain and direction in oblique saccades. Eye Movements from Physiology to Cognition. https://www.sciencedirect.com/science/article/pii/B9780444701138500308

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. ELife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Kojima, Y., Iwamoto, Y., & Yoshida, K. (2004). Memory of learning facilitates saccadic adaptation in the monkey. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 24(34), 7531–7539.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Morehead, J. R., Taylor, J. A., Parvin, D. E., & Ivry, R. B. (2017). Characteristics of implicit sensorimotor adaptation revealed by task-irrelevant clamped feedback. Journal of Cognitive Neuroscience, 29(6), 1061–1074.

      Seitz, & Watanabe. (2005). A unified model for perceptual learning. Trends in Cognitive Sciences, 9(7), 329–334.

      Seitz, & Watanabe. (2009). The phenomenon of task-irrelevant perceptual learning. Vision Research, 49(21), 2604–2610.

      Sivak, B., & Mackenzie, C. L. (1992). Chapter 10 The Contributions of Peripheral Vision and Central Vision to Prehension. In L. Proteau & D. Elliott (Eds.), Advances in Psychology (Vol. 85, pp. 233–259). North-Holland.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Kim, H. E., Saxena, A., Parvin, D. E., Verstynen, T., & Ivry, R. B. (2022). Dissociable use-dependent processes for volitional goal-directed reaching. Proceedings. Biological Sciences / The Royal Society, 289(1973), 20220415.

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. ELife, 11, e76639.

      Tsay, J. S., Parvin, D. E., & Ivry, R. B. (2020). Continuous reports of sensed hand position during sensorimotor adaptation. Journal of Neurophysiology, 124(4), 1122–1130.

      Tsay, J. S., Tan, S., Chu, M. A., Ivry, R. B., & Cooper, E. A. (2023). Low Vision Impairs Implicit Sensorimotor Adaptation in Response to Small Errors, But Not Large Errors. Journal of Cognitive Neuroscience, 35(4), 736–748.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the influence of visual uncertainty and Bayesian cue combination on implicit motor adaptation in young healthy participants. The evidence supporting the claims of the authors is solid, although a better discussion of the link between the model variables and the outcomes of related behavioral experiments would strengthen the conclusions. The work will be of interest to researchers in sensory cue integration and motor learning.

      Public Reviews:

      Reviewer #1 (Public Review):

      This valuable study demonstrates a novel mechanism by which implicit motor adaptation saturates for large visual errors in a principled normative Bayesian manner. Additionally, the study revealed two notable empirical findings: visual uncertainty increases for larger visual errors in the periphery, and proprioceptive shifts/implicit motor adaptation are non-monotonic, rather than ramp-like. This study is highly relevant for researchers in sensory cue integration and motor learning. However, I find some areas where statistical quantification is incomplete, and the contextualization of previous studies to be puzzling.

      Thank you for your feedback and the positive highlights of our study. We appreciate your insights and will address the concerns in our revisions.

      Issue #1: Contextualization of past studies.

      While I agree that previous studies have focused on how sensory errors drive motor adaptation (e.g., Burge et al., 2008; Wei and Kording, 2009), I don't think the PReMo model was contextualized properly. Indeed, while PReMo should have adopted clearer language - given that proprioception (sensory) and kinaesthesia (perception) have been used interchangeably, something we now make clear in our new study (Tsay, Chandy, et al. 2023) - PReMo's central contribution is that a perceptual error drives implicit adaptation (see Abstract): the mismatch between the felt (perceived) and desired hand position. The current paper overlooks this contribution. I encourage the authors to contextualize PReMo's contribution more clearly throughout. Not mentioned in the current study, for example, PReMo accounts for the continuous changes in perceived hand position in Figure 4 (Figure 7 in the PReMo study).

      There is no doubt that the current study provides important additional constraints on what determines perceived hand position: Firstly, it offers a normative Bayesian perspective in determining perceived hand position. PReMo suggests that perceived hand position is determined by integrating motor predictions with proprioception, then adding a proprioceptive shift; PEA formulates this as the optimal integration of these three inputs. Secondly, PReMo assumed visual uncertainty to remain constant for different visual errors; PEA suggests that visual uncertainty ought to increase (but see Issue #2).

      Thank you for the comments and suggestions. We have now incorporated the citation for (Tsay et al., 2024), to acknowledge their clarification on the terms of perceptual error. We also agree that our model differs in two fundamental ways. One is to ditch the concept of proprioceptive shift and its contribution to the perceived hand location; instead, we resort to a “one-shot” integration of three types of cues with Bayesian rules. This is a more elegant and probably more ecological way of processing hand location per Occam's Razor. The second essential change is to incorporate the dependency of visual uncertainty on perturbation size into the model, as opposed to resorting to a ramp function of proprioceptive changes relative to perturbation size. The ramp function is not well grounded in perception studies. Yes, we acknowledged that PReMo is the first to recognize the importance of perceptual error, but highlighted the model differences in our Discussion.

      We also think the PReMo model has the potential to explain Fig 4A. But the Tsay et al., 2022 paper assumes that “a generic shift in visual space” explains the gradual proprioceptive changes from negative to positive (see page 17 in Tsay et al., 2022). We do not think that evoking this visual mechanism is necessary to explain Fig 4A; instead, the proprioceptive change is a natural result of hand deviations during implicit adaptation. As the hand moves away from the target (in the positive direction) during adaptation, the estimated hand location goes alone with it. We believe this is the correct way of explaining Fig4A results. As we played around with the PReMo model, we found it is hard to use visual shift to explain this part of data without additional assumptions (at least not with the ones published in Tsay et al., 2022). Furthermore, our PEA model also parsimoniously explains away the proprioceptive shift observed in a completely different setting, i,e., the proprioceptive changes measured by the passive method as a function of perturbation size in Exp 3.

      We expanded the discussion about the comparison between the two models, especially about their different views for explaining Fig4A.

      Issue #2: Failed replication of previous results on the effect of visual uncertainty.

      (2a) A key finding of this paper is that visual uncertainty linearly increases in the periphery; a constraint crucial for explaining the non-monotonicity in implicit adaptation. One notable methodological deviation from previous studies is the requirement to fixate on the target: Notably, in the current experiments, participants were asked to fixate on the target, a constraint not imposed in previous studies. In a free-viewing environment, visual uncertainty may not attenuate as fast, and hence, implicit adaptation does not attenuate as quickly as that revealed in the current design with larger visual errors. Seems like this current fixation design, while important, needs to be properly contextualized considering how it may not represent most implicit adaptation experiments.

      First, we don’t think there is any previous study that examined visual uncertainty as a function of perturbation size. Thus, we do not have a replication problem here. Secondly, our data indicate that even without asking people to fixate on the target, people still predominantly fixate on the target during error-clamp adaptation (when they are “free” viewing). For our Exp 1, the fixation on the straight line between the starting position and the target is 86%-95% (as shown in Figure S1 now, also see below). We also collected eye-tracking data in Exp 4, which is a typical error-clamp experiment. More than 95% fall with +/- 50 pixels around the center of the screen, even slightly higher than Exp 1. This is well understandable: the typical error-clamp adaptation requires people to ignore the cursor and move the hand towards the target. To minimize the interference of the concurrently moving cursor, people depend on the fixation on the target, the sole task-relevant visual marker in the workspace, to achieve the task goal.

      In sum, forcing the participants to fixate on the target is not because we aimed to make up the linear dependency of visual uncertainty; we required them to do so to mimic the eye-tracking pattern in typical error-clamp learning, which has been revealed in our pilot experiment. The visual uncertainty effect is sound, our study is the first to clearly demonstrate it.

      Author response image 1.

      On a side note (but an important one), the high percentage of fixation on the aiming target is also true for conventional visuomotor rotation, which involves strategic re-aiming (shown in Bromberg et al., 2019; de Brouwer et al., 2018, we have an upcoming paper to show this). This is one reason that our new theory would also be applicable to other types of motor adaptation.

      (2b) Moreover, the current results - visual uncertainty attenuates implicit adaptation in response to large, but not small, visual errors - deviates from several past studies that have shown that visual uncertainty attenuates implicit adaptation to small, but not large, visual errors (Tsay, Avraham, et al. 2021; Makino, Hayashi, and Nozaki, n.d.; Shyr and Joshi 2023). What do the authors attribute this empirical difference to? Would this free-viewing environment also result in the opposite pattern in the effect of visual uncertainty on implicit adaptation for small and large visual errors?

      We don’t think all the mentioned previous studies manipulated the visual uncertainty in a parametric way, and none of them provided quantitative measures of visual uncertainty. As we detailed in our Exp4 and in our Discussion, we don’t think Tsay et al., 2021 paper’s manipulation of visual uncertainty is appropriate (see below for 2d). Makino et al., 2023 study used multiple clamped cursors to perturb people, and its effect is not easily accountable since additional processes might be invoked given this kind of complex visual feedback. More importantly, we do not think this is a direct way of modulating visual uncertainty, nor did they provide any evidence.

      (2c) In the current study, the measure of visual uncertainty might be inflated by brief presentation times of comparison and referent visual stimuli (only 150 ms; our previous study allowed for a 500 ms viewing time to make sure participants see the comparison stimuli). Relatedly, there are some individuals whose visual uncertainty is greater than 20 degrees standard deviation. This seems very large, and less likely in a free-viewing environment.

      For our 2AFC, the reference stimulus is the actual clamped cursor, which lasts for 800 ms. The comparison stimulus is a 150-ms dot representation appearing near the reference. For measuring perception of visual motion, this duration is sufficient as previous studies used similar durations (Egly & Homa, 1984; Owsley et al., 1995). We think the 20-degree standard deviation is reasonable given that people fixate on the target, with only peripheral vision to process the fast moving cursor. The steep linear increase in visual uncertainty about visual motion is well documented. The last author of this paper has shown that the uncertainty of visual motion speed (though not about angels) follows the same steep trend (Wei et al., 2010). It is noteworthy that without using our measured visual uncertainty in Exp1, if we fit the adaptation data in Exp2 to “estimate” the visual uncertainty, they are in fact well aligned with each other (see Figure S7 and Supplementary Text 2). This is a strong support that our estimation is valid and accurate. We think this high visual uncertainty is an important message to the field. Thus we now highlighted its magnitude in our Discussion.

      (2d) One important confound between clear and uncertain (blurred) visual conditions is the number of cursors on the screen. The number of cursors may have an attenuating effect on implicit adaptation simply due to task-irrelevant attentional demands (Parvin et al. 2022), rather than that of visual uncertainty. Could the authors provide a figure showing these blurred stimuli (gaussian clouds) in the context of the experimental paradigm? Note that we addressed this confound in the past by comparing participants with and without low vision, where only one visual cursor is provided for both groups (Tsay, Tan, et al. 2023).

      Thank you for raising this important point about types of visual stimuli for manipulating uncertainty. We used Gaussian blur of a single cursor (similar to Burge et al., 2008) instead of a cloud of dots. We now added a figure inset to show how this blur looks.

      Using a cursor cloud Makino et al., 2023; Tsay et al., 2021 to modulate visual uncertainty has inherent drawbacks that make it unsuitable for visuomotor adaptation. For the error clamp paradigm, the error is defined as angular deviation. The cursor cloud consists of multiple cursors spanning over a range of angles, which affects both the sensory uncertainty (the intended outcome) and the sensory estimate of angles (the error estimate, the undesired outcome). In Bayesian terms, the cursor cloud aims to modulate the sigma of a distribution (sigma_v       in         our       model), but it additionally affects the mean of the distribution (mu). This unnecessary confound is avoided by using cursor blurring, which is still a cursor with its center (mu) unchanged from a single cursor. Furthermore, as correctly pointed out in the original paper by Tsay et al., 2021, the cursor cloud often overlaps with the visual target, this “target hit” would affect adaptation, possibly via a reward learning mechanism (See Kim et al., 2019). This is a second confound that accompanies the cursor cloud.

      Issue #3: More methodological details are needed.

      (3a) It's unclear why, in Figure 4, PEA predicts an overshoot in terms of perceived hand position from the target. In PReMo, we specified a visual shift in the perceived target position, shifted towards the adapted hand position, which may result in overshooting of the perceived hand position with this target position. This visual shift phenomenon has been discovered in previous studies (e.g., (Simani, McGuire, and Sabes 2007)).

      Visual shift, as it is called in Simani et al., 2007, is irrelevant for our task here. The data we are modeling are motor adaptation (hand position changes) and so-called proprioceptive changes (hand localization changes), both are measured and referenced in the extrinsic coordinate, not referenced to a visual target. For instance, the proprioceptive changes are either relative to the actual hand location (Exp 3) or relative to the goal (Fig 4A). We also don’t think visual shift is necessary in explaining the perceptual judgment of an unseen hand (the target shown during the judgment indeed has an effect of reducing the biasing effect of PE, see below for responses to reviewer 3).

      In the PEA model, the reported hand angle is the result of integrating cues from the actual hand position and the estimated hand position (x_hand_hat) from previous movements. This integration process leads to the combined reported hand position potentially overshooting or undershooting, depending on the degree of adaptation. It is the changed proprioceptive cue (because the actively moved hand slowly adapted to the error clamp) leading to the overshoot of the perceived hand position.

      In Results, we now explain these value changes with parentheses. Model details about the mechanisms of cue combination and model predictions can be found in Supplementary Text 1. We believe these detailed explanations can make this apparent.

      (3b) The extent of implicit adaptation in Experiment 2, especially with smaller errors, is unclear. The implicit adaptation function seems to be still increasing, at least by visual inspection. Can the authors comment on this trend, and relatedly, show individual data points that help the reader appreciate the variability inherent to these data?

      Indeed, the adaptation for small errors appears not completely saturated with our designated number of trials. However, this will not affect our model analysis. Our model fitting for PEA and other competing models is done on the time-series of adaptation, not on the saturated adaptation extent (see Fig 3A). Thus, despite that some conditions might not produce the full range of adaptation, the data is sufficient to constrain the models. We now mention this concern in Results; we also emphasize that the model not only explains the adaptation magnitude (operationally defined as adaptation extent measured at the same time, i.e., the end of the adaptation phase) but also the full learning process.

      In response, we have included individual data points in the revised Figure 3B-D to provide a clear illustration of the extent of implicit adaptation, particularly for small perturbations.

      (3c) The same participants were asked to return for multiple days/experiments. Given that the authors acknowledge potential session effects, with attenuation upon re-exposure to the same rotation (Avraham et al. 2021), how does re-exposure affect the current results? Could the authors provide clarity, perhaps a table, to show shared participants between experiments and provide evidence showing how session order may not be impacting results?

      Thank you for raising the issue of session and re-exposure effects. First, we don’t think Exp1 has an effect on Exp4. Exp1 is a perceptual task and Exp4 is a motor adaptation task. Furthermore, Exp1 used random visual stimuli on both sides, thus it did not lead to any adaptation effect on its own. Second, Exp4 indeed had three sessions performed on three days, but the session effect does not change our main conclusion about the visual uncertainty. We used a 3-way repeated-measures anova (3 day x 3 perturbation x 2 visual uncertainty) revealed a significant main effect of day (F(2,36) = 17.693, p<0.001), indicating changes in performance across sessions (see Figure below). Importantly, the effects of perturbation and visual uncertainty (including their interactions) remain the same. The day factor did not interact with them. The main effect of day shows that the overall adaptation effect is reduced across days. Post-hoc pairwise comparisons elucidated that single-trial learning (STL) performance on Day 1 was significantly higher than on Day 2 (p = 0.004) and Day 3 (p < 0.001), with no significant difference between Day 2 and Day 3 (p = 0.106). Other ANOVA details: significant main effects for perturbation (F(1,36) = 8.872, p<0.001) and visual uncertainty (F(1,18) = 49.164, p<0.001), as well as a significant interaction between perturbation size and visual uncertainty (F(2,36) = 5.160, p = 0.013). There were no significant interactions involving the day factor with any other factors (all p > 0.182). Thus, the overall adaptation decreases over the days, but the day does not affect our concerned interaction effect of visual uncertainty and perturbation. The fact that their interaction preserved over different sessions strengthened our conclusion about how visual uncertainty systematically affects implicit adaptation.

      Author response image 2.

      (3d) The number of trials per experiment should be detailed more clearly in the Methods section (e.g., Exp 4). Moreover, could the authors please provide relevant code on how they implemented their computational models? This would aid in future implementation of these models in future work. I, for one, am enthusiastic to build on PEA.

      We have clarified the number of trials conducted in each experiment, with detailed information now readily available in the Methods section of the main text. In addition, we have made the code for data analysis and modeling publicly accessible. These resources can be found in the updated "Data Availability" section of our paper.

      (3f) In addition to predicting a correlation between proprioceptive shift and implicit adaptation on a group level, both PReMo and PEA (but not causal inference) predict a correlation between individual differences in proprioceptive shift and proprioceptive uncertainty with the extent of implicit adaptation (Tsay, Kim, et al. 2021). Interestingly, shift and uncertainty are independent (see Figures 4F and 6C in Tsay et al, 2021). Does PEA also predict independence between shift and uncertainty? It seems like PEA does predict a correlation.

      Thank you for addressing this insightful question. Our PEA model indeed predicts a positive correlation (although not linear) between the proprioceptive uncertainty and the amplitude of the estimated hand position (x_hand_hat). This prediction is consistent with the simulations conducted, using the same parameters that were applied to generate the results depicted in

      Figure 4B of our manuscript (there is a sign flip as x_hand_hat is negative).

      Author response image 3.

      Regarding the absence of a correlation observed in Tsay et al., 2021, we offer several potential explanations for this discrepancy. First, the variability observed in passive hand localization during motor adaptation (as in Tsay et al., 2021) does not directly equal proprioceptive uncertainty, which typically requires psychophysical testing to accurately assess. Second, our study showed that the proprioceptive bias attenuates during the repetitive measurements; in our Exp3, it decreased within a block of three trials. We noticed that Tsay et al., 2021 study used 36 measurements in a row without interleaving adaptation trials. Thus, the “averaged” proprioceptive bias in Tsay’s study might not reflect the actual bias during adaptation. We also noticed that that study showed large individual differences in both proprioceptive bias and proprioceptive variability (not uncertainty), thus getting a positive result, if it were really there, would require a large number of participants, probably larger than their n=30ish sample size. These putative explanations are not put in the revision, which already has a long discussion and has no space for discussing about a null result.

      Reviewer #2 (Public Review):

      Summary:

      The authors present the Perceptual Error Adaptation (PEA) model, a computational approach offering a unified explanation for behavioral results that are inconsistent with standard state-space models. Beginning with the conventional state-space framework, the paper introduces two innovative concepts. Firstly, errors are calculated based on the perceived hand position, determined through Bayesian integration of visual, proprioceptive, and predictive cues. Secondly, the model accounts for the eccentricity of vision, proposing that the uncertainty of cursor position increases with distance from the fixation point. This elegantly simple model, with minimal free parameters, effectively explains the observed plateau in motor adaptation under the implicit motor adaptation paradigm using the error-clamp method. Furthermore, the authors experimentally manipulate visual cursor uncertainty, a method established in visuomotor studies, to provide causal evidence. Their results show that the adaptation rate correlates with perturbation sizes and visual noise, uniquely explained by the PEA model and not by previous models. Therefore, the study convincingly demonstrates that implicit motor adaptation is a process of Bayesian cue integration

      Strengths:

      In the past decade, numerous perplexing results in visuomotor rotation tasks have questioned their underlying mechanisms. Prior models have individually addressed aspects like aiming strategies, motor adaptation plateaus, and sensory recalibration effects. However, a unified model encapsulating these phenomena with a simple computational principle was lacking. This paper addresses this gap with a robust Bayesian integration-based model. Its strength lies in two fundamental assumptions: motor adaptation's influenced by visual eccentricity, a well-established vision science concept, and sensory estimation through Bayesian integration. By merging these well-founded principles, the authors elucidate previously incongruent and diverse results with an error-based update model. The incorporation of cursor feedback noise manipulation provides causal evidence for their model. The use of eye-tracking in their experimental design, and the analysis of adaptation studies based on estimated eccentricity, are particularly elegant. This paper makes a significant contribution to visuomotor learning research.

      Weaknesses:

      The paper provides a comprehensive account of visuomotor rotation paradigms, addressing incongruent behavioral results with a solid Bayesian integration model. However, its focus is narrowly confined to visuomotor rotation, leaving its applicability to broader motor learning paradigms, such as force field adaptation, saccadic adaptation, and de novo learning paradigms, uncertain. The paper's impact on the broader fields of neuroscience and cognitive science may be limited due to this specificity. While the paper excellently demonstrates that specific behavioral results in visuomotor rotation can be explained by Bayesian integration, a general computational principle, its contributions to other motor learning paradigms remain to be explored. The paper would benefit from a discussion on the model's generality and its limitations, particularly in relation to the undercompensating effects in other motor learning paradigms.

      Thank you for your thoughtful review and recognition of the contributions our work makes towards understanding implicit motor adaptation through the Perceptual Error Adaptation (PEA) model. We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.

      We added more discussion on the possible broad implications of our model in the revision.

      Reviewer #3 (Public Review):

      Summary

      In this paper, the authors model motor adaptation as a Bayesian process that combines visual uncertainty about the error feedback, uncertainty about proprioceptive sense of hand position, and uncertainty of predicted (=planned) hand movement with a learning and retention rate as used in state space models. The model is built with results from several experiments presented in the paper and is compared with the PReMo model (Tsay, Kim, et al., 2022) as well as a cue combination model (Wei & Körding, 2009). The model and experiments demonstrate the role of visual uncertainty about error feedback in implicit adaptation.

      In the introduction, the authors notice that implicit adaptation (as measured in error-clamp-based paradigms) does not saturate at larger perturbations, but decreases again (e.g. Moorehead et al., 2017 shows no adaptation at 135{degree sign} and 175{degree sign} perturbations). They hypothesized that visual uncertainty about cursor position increases with larger perturbations since the cursor is further from the fixated target. This could decrease the importance assigned to visual feedback which could explain lower asymptotes.

      The authors characterize visual uncertainty for 3 rotation sizes in the first experiment, and while this experiment could be improved, it is probably sufficient for the current purposes. Then the authors present a second experiment where adaptation to 7 clamped errors is tested in different groups of participants. The models' visual uncertainty is set using a linear fit to the results from experiment 1, and the remaining 4 parameters are then fit to this second data set. The 4 parameters are 1) proprioceptive uncertainty, 2) uncertainty about the predicted hand position, 3) a learning rate, and 4) a retention rate. The authors' Perceptual Error Adaptation model ("PEA") predicts asymptotic levels of implicit adaptation much better than both the PReMo model (Tsay, Kim et al., 2022), which predicts saturated asymptotes, or a causal inference model (Wei & Körding, 2007) which predicts no adaptation for larger rotations. In a third experiment, the authors test their model's predictions about proprioceptive recalibration, but unfortunately, compare their data with an unsuitable other data set. Finally, the authors conduct a fourth experiment where they put their model to the test. They measure implicit adaptation with increased visual uncertainty, by adding blur to the cursor, and the results are again better in line with their model (predicting overall lower adaptation) than with the PReMo model (predicting equal saturation but at larger perturbations) or a causal inference model (predicting equal peak adaptation, but shifted to larger rotations). In particular, the model fits experiment 2 and the results from experiment 4 show that the core idea of the model has merit: increased visual uncertainty about errors dampens implicit adaptation.

      Strengths

      In this study, the authors propose a Perceptual Error Adaptation model ("PEA") and the work combines various ideas from the field of cue combination, Bayesian methods, and new data sets, collected in four experiments using various techniques that test very different components of the model. The central component of visual uncertainty is assessed in the first experiment. The model uses 4 other parameters to explain implicit adaptation. These parameters are 1) learning and 2) retention rate, as used in popular state space models, and the uncertainty (variance) of 3) predicted and 4) proprioceptive hand position. In particular, the authors observe that asymptotes for implicit learning do not saturate, as claimed before, but decrease again when rotations are very large and that this may have to do with visual uncertainty (e.g. Tsay et al., 2021, J Neurophysiol 125, 12-22). The final experiment confirms predictions of the fitted model about what happens when visual uncertainty is increased (overall decrease of adaptation). By incorporating visual uncertainty depending on retinal eccentricity, the predictions of the PEA model for very large perturbations are notably different from and better than, the predictions of the two other models it is compared to. That is, the paper provides strong support for the idea that visual uncertainty of errors matters for implicit adaptation.

      Weaknesses

      Although the authors don't say this, the "concave" function that shows that adaptation does not saturate for larger rotations has been shown before, including in papers cited in this manuscript.

      The first experiment, measuring visual uncertainty for several rotation sizes in error-clamped paradigms has several shortcomings, but these might not be so large as to invalidate the model or the findings in the rest of the manuscript. There are two main issues we highlight here. First, the data is not presented in units that allow comparison with vision science literature. Second, the 1 second delay between the movement endpoint and the disappearance of the cursor, and the presentation of the reference marker, may have led to substantial degradation of the visual memory of the cursor endpoint. That is, the experiment could be overestimating the visual uncertainty during implicit adaptation.

      The paper's third experiment relies to a large degree on reproducing patterns found in one particular paper, where the reported hand positions - as a measure of proprioceptive sense of hand position - are given and plotted relative to an ever-present visual target, rather than relative to the actual hand position. That is, 1) since participants actively move to a visual target, the reported hand positions do not reflect proprioception, but mostly the remembered position of the target participants were trying to move to, and 2) if the reports are converted to a difference between the real and reported hand position (rather than the difference between the target and the report), those would be on the order of ~20{degree sign} which is roughly two times larger than any previously reported proprioceptive recalibration, and an order of magnitude larger than what the authors themselves find (1-2{degree sign}) and what their model predicts. Experiment 3 is perhaps not crucial to the paper, but it nicely provides support for the idea that proprioceptive recalibration can occur with error-clamped feedback.

      Perhaps the largest caveat to the study is that it assumes that people do not look at the only error feedback available to them (and can explicitly suppress learning from it). This was probably true in the experiments used in the manuscript, but unlikely to be the case in most of the cited literature. Ignoring errors and suppressing adaptation would also be a disastrous strategy to use in the real world, such that our brains may not be very good at this. So the question remains to what degree - if any - the ideas behind the model generalize to experiments without fixation control, and more importantly, to real-life situations.

      Specific comments:

      A small part of the manuscript relies on replicating or modeling the proprioceptive recalibration in a study we think does NOT measure proprioceptive recalibration (Tsay, Parvin & Ivry, JNP, 2020). In this study, participants reached for a visual target with a clamped cursor, and at the end of the reach were asked to indicate where they thought their hand was. The responses fell very close to the visual target both before and after the perturbation was introduced. This means that the difference between the actual hand position, and the reported/felt hand position gets very large as soon as the perturbation is introduced. That is, proprioceptive recalibration would necessarily have roughly the same magnitude as the adaptation displayed by participants. That would be several times larger than those found in studies where proprioceptive recalibration is measured without a visual anchor. The data is plotted in a way that makes it seem like the proprioceptive recalibration is very small, as they plot the responses relative to the visual target, and not the discrepancy between the actual and reported hand position. It seems to us that this study mostly measures short-term visual memory (of the target location). What is astounding about this study is that the responses change over time to begin with, even if only by a tiny amount. Perhaps this indicates some malleability of the visual system, but it is hard to say for sure.

      Regardless, the results of that study do not form a solid basis for the current work and they should be removed. We would recommend making use of the dataset from the same authors, who improved their methods for measuring proprioception shifts just a year later (Tsay, Kim, Parvin, Stover, and Ivry, JNP, 2021). Although here the proprioceptive shifts during error-clamp adaptation (Exp 2) were tiny, and not quite significant (p<0.08), the reports are relative to the actual location of the passively placed unseen hand, measured in trials separate from those with reach adaptation and therefore there is no visual target to anchor their estimates to.

      Experiment 1 measures visual uncertainty with increased rotation size. The authors cite relevant work on this topic (Levi & Klein etc) which has found a linear increase in uncertainty of the position of more and more eccentrically displayed stimuli.

      First, this is a question where the reported stimuli and effects could greatly benefit from comparisons with the literature in vision science, and the results might even inform it. In order for that to happen, the units for the reported stimuli and effects should (also) be degrees of visual angle (dva).

      As far as we know, all previous work has investigated static stimuli, where with moving stimuli, position information from several parts of the visual field are likely integrated over time in a final estimate of position at the end of the trajectory (a Kalman filter type process perhaps). As far as we know, there are no studies in vision science on the uncertainty of the endpoint of moving stimuli. So we think that the experiment is necessary for this study, but there are some areas where it could be improved.

      Then, the linear fit is done in the space of the rotation size, but not in the space of eccentricity relative to fixation, and these do not necessarily map onto each other linearly. If we assume that the eye-tracker and the screen were at the closest distance the manufacturer reports it to work accurately at (45 cm), we would get the largest distances the endpoints are away from fixation in dva. Based on that assumed distance between the participant and monitor, we converted the rotation angles to distances between fixation and the cursor endpoint in degrees visual angle: 0.88, 3.5, and 13.25 dva (ignoring screen curvature, or the absence of it). The ratio between the perturbation angle and retinal distance to the endpoint is roughly 0.221, 0.221, and 0.207 if the minimum distance is indeed used - which is probably fine in this case. But still, it would be better to do fit in the relevant perceptual coordinate system.

      The first distance (4 deg rotation; 0.88 dva offset between fixation and stimulus) is so close to fixation (even at the assumed shortest distance between eye and screen) that it can be considered foveal and falls within the range of noise of eye-trackers + that of the eye for fixating. There should be no uncertainty on or that close to the fovea. The variability in the data is likely just measurement noise. This also means that a linear fit will almost always go through this point, somewhat skewing the results toward linearity. The advantage is that the estimate of the intercept (measurement noise) is going to be very good. Unfortunately, there are only 2 other points measured, which (if used without the closest point) will always support a linear fit. Therefore, the experiment does not seem suitable to test linearity, only to characterize it, which might be sufficient for the current purposes. We'd understand if the effort to do a test of linearity using many more rotations requires too much effort. But then it should be made much clearer that the experiment assumes linearity and only serves to characterize the assumed linearity.

      Final comment after the consultation session:

      There were a lot of discussions about the actual interpretation of the behavioral data from this paper with regards to past papers (Tsay et al. 2020 or 2021), and how it matches the different variables of the model. The data from Tsay 2020 combined both proprioceptive information (Xp) and prediction about hand position (Xu) because it involves active movements. On the other hand, Tsay et al. 2021 is based on passive movements and could provide a better measure of Xp alone. We would encourage you to clarify how each of the variables used in the model is mapped onto the outcomes of the cited behavioral experiments.

      The reviewers discussed this point extensively during the consultation process. The results reported in the Tsay 2020 study reflect both proprioception and prediction. However, having a visual target contributes more than just prediction, it is likely an anchor in the workspace that draws the response to it. Such that the report is dominated by short-term visual memory of the target (which is not part of the model). However, in the current Exp 3, as in most other work investigating proprioception, this is calculated relative to the actual direction.

      The solution is fairly simple. In Experiment 3 in the current study, Xp is measured relative to the hand without any visual anchors drawing responses, and this is also consistent with the reference used in the Tsay et al 2021 study and from many studies in the lab of D. Henriques (none of which also have any visual reach target when measuring proprioceptive estimates). So we suggest using a different data set that also measures Xp without any other influences, such as the data from Tsay et al 2021 instead.

      These issues with the data are not superficial and can not be solved within the model. Data with correctly measured biases (relative to the hand) that are not dominated by irrelevant visual attractors would actually be informative about the validity of the PEA model. Dr. Tsay has so much other that we recommend using a more to-the-point data set that could actually validate the PEA model.

      As the comments are repetitive at some places, we summarize them into three questions and address it one by one below:

      (1) Methodological Concerns about visual uncertainty estimation in Experiment 1: a) the visual uncertainty is measured in movement angles (degrees), while the unit in vision science is in visual angles (vda). This mismatch of unit hinders direct comparison between the found visual uncertainty and those reported in the literature, and b) a 1-second delay between movement endpoint and the reference marker presentation causes an overestimate of visual uncertainty due to potential degradation of visual memory. c) The linear function of visual uncertainty is a result of having only three perturbation sizes.

      a) As noted by the reviewer, our visual uncertainty is about cursor motion direction in the display plane, which has never been measured before. We do not think our data is comparable to any findings in visual science about fovea/peripheral comparison. We quoted Klein and others’ work Klein & Levi, 1987; Levi et al., 1987 in vision science since their studies showed that the deviation from the fixation is associated with the increase in visual uncertainty. Their study thus inspired our Exp1 to probe how our concerned visual uncertainty (specifically for visual motion direction) changes with an increasing deviation from the fixation. We believe that any model and its model parameters should be specifically tailored to the task or context it tries to emulate. In our case, motion direction in a center-out reaching setting is the modeled context, and all the relevant model parameters should be specified in movement angles.

      b) The 1s delay of the reference cursor appears to have minimum impact on the estimate of visual uncertainty, based on previous vision studies. Our Exp1 used a similar visual paradigm by White et al., 1992, which shows that delay does not lead to an increase in visual uncertainty over a broad range of values (from 0.2s to >1s, see their Figure 5-6). We will add more methodology justifications in our revision.

      c) We agree that if more angles are tested we can be more confident about the linearity of visual uncertainty. However, the linear function is a good approximation of visual uncertainty (as shown in Figure 2C). More importantly, our model performance does not hinge on a strict linear function. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper, as correctly pointed out by the reviewer. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Lastly, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this analysis in the revision. See details in Supplementary text 2 and Figure S7.

      (2) Experiment 3's: the reviewer argues that the Tsay et al., 2020 data does not accurately measure proprioceptive recalibration, thus it is not suitable for showing our model’s capacity in explaining proprioceptive changes during adaptation.

      Response: We agree that the data from Tsay et al., 2020 is not from passive localization, which is regarded as the widely-accepted method to measure proprioceptive recalibration, a recalibration effect in the sensory domain. The active localization, as used in Tsay et al., 2020, is hypothesized as closely related to people’s forward prediction (where people want to go as the reviewer put it in the comments). However, we want to emphasize that we never equated Tsay’s findings as proprioceptive recalibration: throughout the paper we call them “reported hand location”. We reserved “proprioceptive recalibration” to our own Exp3, which used a passive localization method. Thus, we are not guilty of using this term. Secondly, as far as we know, localization bias or changes, no matter measured by passive or active methods, have not been formally modeled quantitatively. We believe our model can explain both, at least in the error-clamp adaptation setting here. Exp3 is for passive localization, the proprioceptive bias is caused by the biasing effect from the just-perceived hand location (X_hand_hat) from the adaptation trial. Tsay et al. 2020 data is for active localization, whose bias shows a characteristic change from negative to positive. This can be explained by just-perceived hand location (X_hand_hat again) and a gradually-adapting hand (X_p). We think this is a significant advance in the realm of proprioceptive changes in adaptation. Of course, our idea can be further tested in other task conditions, e.g., conventional visuomotor rotation or even gain adaptation, which should be left for future studies.

      For technical concerns, Tsay et al., 2020 data set is not ideal: when reporting hand location, the participants view the reporting wheel as well as the original target. As correctly pointed out by the reviewer, the presence of the target might provide an anchoring cue for perceptual judgment, which acts as an attractor for localization. If it were the case, our cue combination would predict that this extra attractor effect would lead to a smaller proprioceptive effect than that is currently reported in their paper. The initial negative bias will be closer to the target (zero), and the later positive bias will be closer to the target too. However, the main trend will remain, i.e. the reported hand location would still show the characteristic negative-to-positive change. The attractor effect of the target can be readily modeled by giving less weight to the just-perceived hand location (X_hand_hat). Thus, we would like to keep Tsay et al., 2020 data in our paper but add some explanations of the limitations of this dataset as well as how the model would fare with these limitations.

      That being said, our model can explain away both passive and active localization during implicit adaptation elicited by error clamp. The dataset from Tsay et al., 2021 paper is not a good substitute for their 2020 paper in terms of modeling, since that study interleaved some blocks of passive localization trials with adaptation trials. This kind of block design would lead to forgetting of both adaptation (Xp in our model) and the perceived hand (X_hand_hat in our model), the latter is still not considered in our model yet. As our Exp3, which also used passive localization, shows, the influence of the perceived hand on proprioceptive bias is short-lived, up to three trials without adaptation trials. Of course, it would be of great interest to design future studies to study how the proprioceptive bias changes over time, and how its temporal changes relate to the perceptual error. Our model provides a testbed to move forward in this direction.

      (3) The reviewer raises concerns about the study's assumption that participants ignore error feedback, questioning the model's applicability to broader contexts and real-world scenarios where ignoring errors might not be viable or common.

      Reviewer 2 raised the same question above. We moved our responses here. “We appreciate your suggestion to broaden the discussion about the model's applicability beyond the visuomotor rotation paradigm, a point we acknowledge was not sufficiently explored in our initial discussion.

      Our model is not limited to the error-clamp adaptation, where the participants were explicitly told to ignore the rotated cursor. The error-clamp paradigm is one rare example that implicit motor learning can be isolated in a nearly idealistic way. Our findings thus imply two key aspects of implicit adaptation: 1) localizing one’s effector is implicitly processed and continuously used to update the motor plan; 2) Bayesian cue combination is at the core of integrating movement feedback and motor-related cues (motor prediction cue in our model) when forming procedural knowledge for action control.

      We will propose that the same two principles should be applied to various kinds of motor adaptation and motor skill learning, which constitutes motor learning in general. Most of our knowledge about motor adaptation is from visuomotor rotation, prism adaptation, force field adaptation, and saccadic adaptation. The first three types all involve localizing one’s effector under the influence of perturbed sensory feedback, and they also have implicit learning. We believe they can be modeled by variants of our model, or at least should consider using the two principles we laid out above to think of their computational nature. For skill learning, especially for de novo learning, the area still lacks a fundamental computational model that accounts for skill acquisition process on the level of relevant movement cues. Our model suggests a promising route, i.e., repetitive movements with a Bayesian cue combination of movement-related cues might underlie the implicit process of motor skills.”

      We also add one more important implication of our model: as stated above, our model also explains that the proprioceptive changes, revealed by active or passive localization methods, are brought by (mis)perceived hand localization via Bayesian cue combination. This new insight, though only tested here using the error-clamp paradigm, can be further utilized in other domains, e.g., conventional visuomotor rotation or force field adaptation. We hope this serves as an initial endeavor in developing some computational models for proprioception studies. Please see the extended discussion on this matter in the revision.

      Recommendations for the authors:

      Revisions:

      All three reviewers were positive about the work and have provided a set of concrete and well-aligned suggestions, which the authors should address in a revised version of the article. These are listed below.

      A few points of particular note:

      (1) There are a lot of discussions about the actual interpretation of behavioral data from this paper or past papers (Tsay et al. 2020 or 2021) and how it matches the different variables of the model.

      (2) There are some discussions on the results of the first experiment, both in terms of how it is reported (providing degrees of visual angle) and how it is different than previous results (importance of the point of fixation). We suggest also discussing a few papers on eye movements during motor adaptation from the last years (work of Anouk de Brouwer and Opher Donchin). Could the authors also discuss why they found opposite results to that of previous visual uncertainty studies (i.e., visual uncertainty attenuates learning with large, but not small, visual errors); rather than the other way around as in Burge et al and Tsay et al 2021 and Makino Nozaki 2023 (where visual uncertainty attenuates small, but not large, visual errors).

      (3) It is recommended by several reviewers to discuss the applicability of the model to other areas/perturbations.

      (4) Several reviewers and I believe that the impact of the paper would be much higher if the code to reproduce all the simulations of the model is made available to the readers. In addition, while I am very positive about the fact that the authors shared the data of their experiments, metadata seems to be missing while they are highly important because these data are otherwise useless.

      Thank you for the concise summary of the reviewers’ comments. We have addressed their concerns point by point.

      Reviewer #2 (Recommendations For The Authors):

      L142: The linear increase in visual uncertainty should be substantiated by previous research in vision science. Please cite relevant papers and discuss why the linear model is considered reasonable.

      We cited relevant studies in vision science. Their focus is more about eccentricity inflate visual uncertainty, similar to our findings that deviations from the fixation direction inflate visual uncertainty about motion direction.

      We also want to add that our model performance does not hinge on a strict linear function of visual uncertainty. Say, if it is a power function with an increasing slope, our model will still predict the major findings presented in the paper. It is the increasing trend of visual uncertainty, which is completely overlooked by previous studies, that lead to various seemingly puzzling findings in implicit adaptation. Furthermore, without assuming a linear function, we fitted the large dataset of motor adaptation from Exp2 to numerically estimate the visual uncertainty. This estimated visual uncertainty has a strong linear relationship with perturbation size (R = 0.991, p<0.001). In fact, the model-fitted visual uncertainty is very close to the values we obtained in Exp1. We now included this new analysis in the revision. See details in Supplementary text 2 and Figure S7.

      L300: I found it challenging to understand the basis for this conclusion. Additional explanatory support is required.

      We unpacked this concluding sentence as follows:

      “The observed proprioceptive bias is formally modeled as a result of the biasing effect of the perceived hand estimate x_hand_hat. In our mini-block of passive localization, the participants neither actively moved nor received any cursor perturbations for three trials in a row. Thus, the fact that the measured proprioceptive bias is reduced to nearly zero at the third trial suggests that the effect of perceived hand estimate x_hand_hat decays rather rapidly.”

      L331: For the general reader, a visual representation of what the blurring mask looks like would be beneficial.

      Thanks for the nice suggestion. We added pictures of a clear and a blurred cursor in Figure 5D.

      L390: This speculation is intriguing. It would be helpful if the authors explained why they consider causal inference to operate at an explicit process level, as the reasoning is not clear here, although the idea seems plausible.

      Indeed, our tentative conclusion here is only based on the model comparison results here. It is still possible that causal inference also work for implicit adaptation besides explicit adaptation. We make a more modest conclusion in the revision:

      “The casual inference model is also based on Bayesian principle, then why does it fail to account for the implicit adaptation? We postulate that the failure of the causal inference model is due to its neglect of visual uncertainty as a function of perturbation size, as we revealed in Experiment 1. In fact, previous studies that advocating the Bayesian principle in motor adaptation have largely focused on experimentally manipulating sensory cue uncertainty to observe its effects on adaptation (Burge et al., 2008; He et al., 2016; Körding & Wolpert, 2004; Wei & Körding, 2010), similar to our Experiment 4. Our findings suggest that causal inference of perturbation alone, without incorporating visual uncertainty, cannot fully account for the diverse findings in implicit adaptation. The increase in visual uncertainty by perturbation size is substantial: our Experiment 1 yielded an approximate seven-fold increase from a 4° perturbation to a 64° perturbation. We have attributed this to the fact that people fixate in the desired movement direction during movements. Interestingly, even for conventional visuomotor rotation paradigm where people are required to “control” the perturbed cursor, their fixation is also on the desired direction, not on the cursor itself (de Brouwer, Albaghdadi, et al., 2018; de Brouwer, Gallivan, et al., 2018). Thus, we postulate that a similar hike in visual uncertainty in other “free-viewing” perturbation paradigms. Future studies are warranted to extend our PEA model to account for implicit adaptation in other perturbation paradigms.”

      L789: The method of estimating Sigma_hand in the brain was unclear. Since Bayesian computation relies on the magnitude of noise, the cognitive system must have estimates of this noise. While vision and proprioception noise might be directly inferred from signals, the noise of the hand could be deduced from the integration of these observations or an internal model estimate. This process of estimating noise magnitude is theorized in recursive Bayesian integration models (or Kalman filtering), where the size estimate of the state noise (sigma_hand) is updated concurrently with the state estimate (x_hand hat). The equation in L789 and the subsequent explanation appear to assume a static model of noise estimation. However, in practice, the noise parameters, including Sigma_hand, are likely dynamic and updated with each new observation. A more detailed explanation of how Sigma_hand is estimated and its role in the cognitive process.

      This is a great comment. In fact, if a Kalman filter is used, the learning rate and the state noise all should be dynamically updated on each trial, under the influence of the observed (x_v). In fact, most adaptation models assume a constant learning rate, including our model here. But a dynamic learning rate (B in our model) is something worth trying. However, in our error-clamp setting, x_v is a constant, thus this observation variable cannot dynamically update the Kalman filter; that’s why we opt to use a “static” Bayesian model to explain our datasets. Thus, Sigma_hand can be estimated by using Bayesian principles as a function of three cues available, i.e., the proprioceptive cue, the visual cue, and the motor prediction cue. We added a

      detailed derivation of sigma_hand in the revision in Supplementary text 1.

      Reviewer #3 (Recommendations For The Authors):

      We observed values in Fig 2C for the 64-degree perturbation that seem to be outliers, i.e., greater than 50 degrees. It is unclear how a psychometric curve could have a "slope" or JNP of over 60, especially considering that the tested range was only 60. Since the data plotted in panel C is a collapse of the signed data in panel B, it is perplexing how such large data points were derived, particularly when the signed uncertainty values do not appear to exceed 30.

      Related to the previous point, we would also recommend connecting individual data points: if the uncertainty increases (linearly or otherwise), then people with low uncertainty at the middle distance should also have low uncertainty at the high distance, and people with high uncertainty at one point, should also have that at other distances. Or perhaps the best way to go about this is to use the uncertainty at the two smaller perturbations to predict uncertainty at the largest perturbation for each participant individually?

      Thank you for your suggestion to examine the consistency of individual levels of visual uncertainty across perturbation sizes. First, a sigma_v of 60 degrees is well possible, naturally falling out of the experimental data. It shows some individuals indeed have large visual uncertainty. Given these potential outliers (which should not be readily removed as we don’t have any reason to do so), we estimated the linear function of sigma_v with a robust method, i.e., the GLM with a gamma distribution, which favors right-skewed distribution that can well capture positive outliers. Furthermore, we added in our revision a verification test of our estimates of sigma_v: we used Exp2’s adaptation data to estimate sigma_v without assuming its linear dependency. As shown, the model-fitted sigma_v closely matched the estimated ones from Exp1 (see Supplementary text 2 and Figure S7).

      We re-plotted the sigma_v with connected data points provided, and the data clearly indicate that individuals exhibit consistent levels of visual uncertainty across different perturbation sizes, i.e. those with relatively lower uncertainty at middle distances (in fact, angles) tend to exhibit relatively lower uncertainty at higher distances too, and similarly, those with higher uncertainty at one distance maintain that level of uncertainty at other distances. This is confirmed by spearman correlation analysis to assess the consistency of uncertainties across different degrees of perturbation among individuals. Again, we observed significant correlations between perturbation angles, indicating good individual consistency (4 and 16 degrees, rho = 0.759, p<0.001; 16 and 64 degrees, rho = 0.527, p = 0.026).

      Author response image 4.

      The illustration in Fig 2A does not seem to show a stimulus that is actually used in the experiment (looks like about -30{degree sign} perturbation). It would be good to show all possible endpoints with all other visual elements to scale - including the start-points of the PEST procedure.

      Thanks for the suggestion. We updated Fig 2A to show a stimulus of +16 degree, as well as added an additional panel to show all the possible endpoints.

      Finally (related to the previous point), in lines 589-591 it says the target is a blue cross. Then in lines 614-616, it says participants are to fixate the blue cross or the start position. The start position was supposed to have disappeared, so perhaps the blue plus moved to the start position (which could be the case, when looking at the bottom panel in Fig 2A, although in the illustration the plus did not move fully to the start position, just toward it to some degree). Perhaps the descriptions need to be clarified, or it should be explained why people had to make an eye movement before giving their judgments. And if people could have made either 1) no eye movement, but stayed at fixation, 2) moved to the blue plus as shown in the last panel in Fig 2A, or 3) fixated on the home position, we'd be curious to know if this affected participants' judgments.

      Thanks for pointing that out. The blue cross serves as the target in the movement task, then disappears with the cursor after 800ms of frozen time. The blue cross then appeared in the discrimination task at the center of the screen, i.e. the start location. Subjects were asked to fixate at the blue cross during the visual discrimination task. Note this return the fixation to the home position is exactly what we will see in typical error-clamp adaptation: once the movement is over, people guided their hand back to the home position. We performed a pilot study to record the typical fixation pattern during error-clamp adaptation, and Exp1 was intentionally designed to mimic its fixation sequence. We have now updated the description of Figure 2A, emphasizing the stimulus sequence. .

      In Figure 4A, the label "bias" is confusing as that is used for recalibrated proprioceptive sense of hand position as well as other kinds of biases elsewhere in the paper. What seems to be meant is the integrated hand position (x-hat_hand?) where all three signals are apparently combined. The label should be changed and/or it should be clarified in the caption.

      Thanks for pointing that out, it should be x_hand_hat, and we have corrected this in the revised version of Figure 4.

      In the introduction, it is claimed that larger perturbations have not been tested with "implicit adaptation" paradigms, but in the same sentence, a paper is cited (Moorehead et al., 2017) that tests a rotation on the same order of magnitude as the largest one tested here (95{degree sign}), as well as much larger rotations (135{degree sign} and 175{degree sign}). With error-clamps. Interestingly, there is no adaptation in those conditions, which seems more in line with the sensory cue integration model. Can the PEA model explain these results as well? If so, this should be included in the paper, and if not, it should be discussed as a limitation.

      First, we double checked our manuscript and found that we never claimed that larger perturbations had not been tested.

      We agree that it is always good to have as many conditions as possible. However, the 135 and 175 degree conditions would lead to minimum adaptation, which would not help much in terms of model testing. We postulated that this lack of adaptation is simply due to the fact that people cannot see the moving cursor, or some other unknown reasons. Our simple model is not designed to cover those kinds of extreme cases.

      Specify the size of the arc used for the proprioceptive tests in Exp 3 and describe the starting location of the indicator (controlled by the left hand). Ideally, the starting location should have varied across trials to avoid systematic bias.

      Thank you for the comments. The size of the arc used during these tests, as detailed in the methods section of our paper, features a ring with a 10 cm radius centered at the start position. This setup is visually represented as a red arc in Figure 7B.

      After completing each proprioceptive test trial, participants were instructed to position the indicator at approximately -180° on the arc and then relax their left arm. Although the starting location for the subsequent trial remained at-180°, it was not identical for every trial, thereby introducing slight variability.

      Please confirm that the proprioceptive biases plotted in Fig 4E are relative to the baseline.

      Thank you for bringing this to our attention. Yes, the proprioceptive biases illustrated in Figure 4E are indeed calculated relative to the baseline measurements. We have added this in the method part.

      Data availability: the data are available online, but there are some ways this can be improved. First, it would be better to use an open data format, instead of the closed, proprietary format currently used. Second, there is no explanation for what's in the data, other than the labels. (What are the units? What preprocessing was done?) Third, no code is made available, which would be useful for a computational model. Although rewriting the analyses in a non-proprietary language (to increase accessibility) is not a reasonable request at this point in the project, I'd encourage it for future projects. But perhaps Python, R, or Julia code that implements the model could be made available as a notebook of sorts so that other labs could look at (build on) the model starting with correct code - increasing the potential impact of this work.

      Great suggestions. We are also fully supportive of open data and open science. We now:

      (1) Updated our data and code repository to include the experimental data in an open data format (.csv) for broader accessibility.

      (2) The data are now accompanied by detailed descriptions to clarify their contents.

      (3) We have made the original MATLAB (.m) codes for data analysis, model fitting and simulation available online.

      (4) We also provide the codes in Jupyter Notebook (.ipynb) formats.

      These updates can be found in the revised “Data Availability” section of our manuscript.

      References

      Bromberg, Z., Donchin, O., & Haar, S. (2019). Eye Movements during Visuomotor Adaptation Represent Only Part of the Explicit Learning. eNeuro, 6(6). https://doi.org/10.1523/ENEURO.0308-19.2019

      Burge, J., Ernst, M. O., & Banks, M. S. (2008). The statistical determinants of adaptation rate in human reaching. Journal of Vision, 8(4), 1–19.

      de Brouwer, A. J., Gallivan, J. P., & Flanagan, J. R. (2018). Visuomotor feedback gains are modulated by gaze position. Journal of Neurophysiology, 120(5), 2522–2531.

      Egly, R., & Homa, D. (1984). Sensitization of the visual field. Journal of Experimental Psychology. Human Perception and Performance, 10(6), 778–793.

      Kim, H. E., Parvin, D. E., & Ivry, R. B. (2019). The influence of task outcome on implicit motor learning. eLife, 8. https://doi.org/10.7554/eLife.39882

      Klein, S. A., & Levi, D. M. (1987). Position sense of the peripheral retina. JOSA A, 4(8), 1543–1553.

      Levi, D. M., Klein, S. A., & Yap, Y. L. (1987). Positional uncertainty in peripheral and amblyopic vision. Vision Research, 27(4), 581–597.

      Makino, Y., Hayashi, T., & Nozaki, D. (2023). Divisively normalized neuronal processing of uncertain visual feedback for visuomotor learning. Communications Biology, 6(1), 1286.

      Owsley, C., Ball, K., & Keeton, D. M. (1995). Relationship between visual sensitivity and target localization in older adults. Vision Research, 35(4), 579–587.

      Simani, M. C., McGuire, L. M. M., & Sabes, P. N. (2007). Visual-shift adaptation is composed of separable sensory and task-dependent effects. Journal of Neurophysiology, 98(5), 2827–2841.

      Tsay, J. S., Avraham, G., Kim, H. E., Parvin, D. E., Wang, Z., & Ivry, R. B. (2021). The effect of visual uncertainty on implicit motor adaptation. Journal of Neurophysiology, 125(1), 12–22.

      Tsay, J. S., Chandy, A. M., Chua, R., Miall, R. C., Cole, J., Farnè, A., Ivry, R. B., & Sarlegna, F. R. (2024). Minimal impact of proprioceptive loss on implicit sensorimotor adaptation and perceived movement outcome. bioRxiv : The Preprint Server for Biology. https://doi.org/10.1101/2023.01.19.524726

      Tsay, J. S., Kim, H., Haith, A. M., & Ivry, R. B. (2022). Understanding implicit sensorimotor adaptation as a process of proprioceptive re-alignment. eLife, 11, e76639.

      Wei, K., Stevenson, I. H., & Körding, K. P. (2010). The uncertainty associated with visual flow fields and their influence on postural sway: Weber’s law suffices to explain the nonlinearity of vection. Journal of Vision, 10(14), 4.

      White, J. M., Levi, D. M., & Aitsebaomo, A. P. (1992). Spatial localization without visual references. Vision Research, 32(3), 513–526.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      …I find the concept and execution of the study very interesting and elegant. The paper is also commendably clear and readable. The differences between primary and higher cortex are compelling and I am largely convinced by the authors' claim that they have found evidence that broadly supports a mixed selectivity model of neural disentanglement along the lines of Rigotti et al (2013). I think that the increasing body of evidence for these kinds of representations is a significant development in our understanding of higher sensory representations. I also think that the dDR method is likely to be useful to researchers in a variety of fields who are looking to perform similar types of neural decoding analysis.

      Thanks! We agree that questions around population coding and high-level representations are critical in the field of sensory systems.

      Reviewer #2 (Public Review):

      ... This is a well-carried out study with thoughtful analyses which in large part achieves its aims to evaluate how task-engagement changes neural activity across multiple auditory regions. As with all work, there are several caveats or areas for future study/analysis. First, the sounds used here (tones, and narrow-band noise) are relatively simple sounds; previous work suggests that exactly what activity is observed within each region (e.g., sensory only, decision-related, etc) may depend in part upon what stimuli are used. Therefore, while the current study adds importantly to the literature, future work may consider the use of more varied stimuli. Second, the animals here were engaged in a behavioral task; but apart from an initial calculation of behavioral d', the task performance (and its effect on neural activity) is largely unaddressed.

      The reviewer makes several important points that we hope we addressed in the specific changes detailed below. Indeed, it is important to recognize the possibility that the specific stimuli involved in a task may interact with the effects of behavioral state and that variability in task performance should be considered as an important aspect of behavioral state.

      Reviewer #1 (Recommendations For The Authors):

      I have a few minor comments and criticisms:

      (1) Figure 1c. The choice of low-contrast grey text (e.g. "Target vs. target" is unfortunate, especially when printed, and should be replaced (e.g. with dark grey).

      We have edited the figure to use a higher contrast (dark grey). Thanks for catching this.

      (2) Figure 2 and Supplementary Figure 3. I think some indication of error or significance is required in all panels. Without this, it's hard to interpret any of these panels.

      Thank you for this feedback. Including significance here was clarifying and helps to strengthen our claim that state-dependent changes in neural activity were smaller and more diverse for single neurons than at the population level. We modified Figure 2b-c to indicate whether each neuron’s response to the target stimulus was significantly different than its response to the catch stimulus. The same test was performed in Supplementary Figure 3. Additionally, we added a statistical test in Figure 2d-e to indicate, for each pair of target/catch stimuli, whether discrimination (d-prime) changed significantly between active and passive conditions. Furthermore, we modified the text of the second paragraph under the results heading: “Diverse effects of task engagement on single neurons in primary and non-primary auditory cortex” to reference and interpret the results of these significance tests. The new text reads as follows (L. 121):

      “Sound-evoked spiking activity was compared between active and passive states to study the impact of task engagement on sound representation. In both A1 and dPEG, responses to target and catch stimuli were significantly discriminable for a subset of single neurons (about 25% in both areas, Figure 2A-C, Supplemental Figures 3-5, bootstrap test). This supports the idea that stimulus identity can be decoded in both brain regions, regardless of task performance. However, the fact that the responses of most neurons in both brain areas could not significantly discriminate target vs. catch stimuli also highlights the diversity of sound encoding observed at the level of single neurons. The accuracy of catch vs. target discrimination for each neuron was quantified using neural d-prime, the z-scored difference in target minus catch spiking response for each neuron (Methods: Single neuron PSTHs and d-prime (Niwa et al., 2012a)). Task engagement was associated with significant changes in catch vs. target d-prime for roughly 10% of neurons in both A1 (40 / 481 neurons, bootstrap test) and dPEG (33 / 377 neurons, bootstrap test). This included neurons that both increased their discriminability and decreased their discriminability (Figure 2D-E). Thus, the effects of task engagement at the level of single neurons were relatively mild and inconsistent across the population; many neurons showed no significant change and of those that did, effects were bidirectional (Figure 2D-E).”

      We also included an additional methods paragraph in the “Statistical tests” section to describe the bootstrapping procedure used for these significance tests (L. 644):

      “The one exception to this general approach is in Figure 2, where we analyzed the sound discrimination abilities of single neurons. In this case, we computed p-values for each neuron and stimulus independently. First, for each neuron and catch vs. target stimulus pair, we measured d-prime (see Methods: Single neuron evoked activity and d-prime). We generated a null distribution of d-prime values for each neuron-stimulus pair, under each experimental condition by shuffling stimulus identity across trials before computing d-prime (100 resamples). A neuron was determined to have a significant d-prime for a given target vs. catch pair if its actual measured d-prime was greater than the 95th percentile of the null d-prime distribution. Second, for each neuron and catch vs. target stimulus pair, we tested if d-prime was significantly different between active and passive conditions. To test this, we followed a similar procedure as above, however, rather than shuffle stimulus identity, we shuffled active vs. passive trial labels. This allowed us to generate a null distribution of active vs. passive d-prime difference for each neuron and stimulus pair. A neuron was determined to have a significant change in d-prime between conditions if the actual Δ d-prime lay outside the 95% confidence interval of the null Δ d-prime distribution.”

      For Figure 2a, we chose not to indicate significance on the figure to avoid clutter, since the significance for all neurons in the population are shown in panels b-c anyway. Additionally, the difference plot shown in panel a is in units of z-scores, which we believe already gives a raw sense of the significance of the target vs. catch response change per neuron in this example dataset.

      (3) Figure 2 and Supplementary Figure 3. I would consider including some more examples as a Supplementary Figure (and perhaps combining Supp Fig 3 with Fig 2 as a main figure).

      We found no significant or apparent difference in single-neuron properties between A1 and dPEG. Therefore, we decided it is not helpful to plot both A1 and PEG examples in the main text. However, we agree that the ability to see more examples of the raw data could be useful. Therefore, we compiled two supplementary figures (Supplementary Figures 4 and 5) that replicate Figure 2a for all datasets, encompassing A1 and PEG.

      (4) Figure 2a and Supp Fig 3a. I was initially confused that the "delta-spk/sec (z-score)" values had themselves been z-scored, but now I think that they are simply the differences of the two left hand sub-panels. This could be made clear in the figure legend.

      The figure legends have been modified to state the procedure for computing “delta-spk/sec” more clearly. Specifically, we added the following information to the legend (L. 141):

      “Difference is computed as the z-scored response to the target minus the z-scored catch response (resulting in a difference shown in units of z-score).”

      (5) Figure 2b-e and Supp Fig 3b-e. Indicate the time window over which the responses were measured, and the number of neurons.

      Figure legends have been modified to include a sentence clearly stating the time window over which responses were measured. The number of neurons is also now included in the legend and on the figure itself. Furthermore, a brief description of the new statistical testing procedure has been added here (L. 144).

      “Responses were defined as the total number of spikes recorded during the 300 ms of sound presentation (area between dashed lines in panel A). Neurons with a significantly different response to the catch vs. target stimulus are indicated in black and quantified on the respective figure panel.”

      (6) Figure 2. "singe" should read "single"

      Typo in figure label has been fixed.

      (7) Line 144. Figure number is missing (Figure 3B-C).

      The missing figure number has been added to the text.

      (8) Figure 3. Again, the low-contrast grey should be replaced.

      The low-contrast grey has been replaced with dark grey.

      Reviewer #2 (Recommendations For The Authors):

      This study really nicely compares the activity and effects on activity in two areas of the auditory cortex in respect to task-engagement; I think it is, for the most part, very well done.

      A couple of specific recommendations:

      (1) Although I understand 'inf dB' as the SNR, including the actual dB level used in the experiments, would be useful, especially in the case of the inf dB.

      Thank you for this feedback. We agree that clarification about the overall sound level used here would be helpful. We have modified the methods section “Behavioral paradigm” to include the following sentence (L. 450):

      “That is, the masking noise (and distractor stimuli) were always presented with an overall sound level of 60 dB SPL. Infinite (inf) dB trials corresponded to trials where the target tone was presented at 60 dB SPL without any masking noise present, 0 dB to trials where the target was 60 dB SPL, -5 dB to trials where the target was presented at 55 dB SPL etc.”

      In addition, we have modified the main text (L. 82):

      “Animals reported the occurrence of a target tone in a sequence of narrowband noise distractors by licking a piezo spout (Figure 1A, Methods: Behavioral paradigm, distractor stimulus sound level: 60 dB SPL). … We describe SNR as the overall SPL of the target relative to distractor noise level. Thus, an SNR of –5 dB corresponds to a target level of 55 dB SPL while an Inf dB SNR corresponds to a target tone presented without any masking noise.”

      And Figure legend 1 now explicitly states the sound level used in the experiments (L. 104):

      “Variable SNR was achieved by varying overall SPL of the target relative to the fixed (60 dB SPL) distractor noise, e.g., -5 dB SNR corresponds to a 55 dB SPL target with 60 dB SPL masking noise. Infinite (inf) dB SNR corresponds to a target tone presented in isolation (60 dB SPL).”

      (2) I very much appreciate the attempt to disentangle task engagement from generalized arousal state, and specifically, addressing this through the use of pupillometry. However, by focusing the discussion of pupil dynamics solely on the arousal-state aspects of pupil size, the paper doesn't address the increasing evidence suggests that pupil size may fluctuate based upon a lot of other things, including perceptual events (see Kronemer et al, 2022 for a recent human paper; for auditory: Zekveld et al 2018 (review) and Montes-Lourido et al, 2021; but many many others, too). It would be nice to see either a bit more nuanced discussion of what pupil size may be indicating (easier), or analyzing the behavior in the context of pupil dynamics (a heavier lift).

      This is a good point. We agree that it is worth mentioning these more nuanced aspects of cognition that may be reflected by pupil size. Therefore, we also analyzed pupil size in the context of behavioral performance (see Supplemental Figure 6) and added the following text to the results (L. 193).

      “In addition to reflecting overall arousal level, pupil size has also been reported to reflect more nuanced cognitive variables such as, for example, listening effort (Zekveld et al., 2014). Furthermore, rodent data suggests that optimal sensory detection is associated with intermediate pupil size (McGinley et al., 2015), consistent with the hypothesis of an inverted-U relationship between arousal and behavioral performance (Zekveld et al., 2014). To determine if this pattern was true for the animals in our task, we measured the dynamics of pupil size in the context of behavioral performance. Across animals, task stimuli evoked robust pupil dilation that varied with trial outcome (Supplemental Figure 6b-c). Notably, pre-trial pupil size was significantly different between correct (hit and correct reject), hit, and miss trials (Supplemental Figure 6b-c), recapitulating the finding of an inverted-U relationship to performance in rodents (McGinley et al., 2015).  Since we focused only on correct trials in our decoding analysis, these outcome-dependent differences in pupil size are unlikely to contribute to the emergent decoding selectivity in dPEG.”

      (3) I think it would make this paper shine that much more if behavioral performance were not subsumed into the overall label of task engagement. You've already established you have performance that varies as a function of SNR; I would love to see the neural d' and covariability related to the behavioral d' (in the comparisons where this is possible). I would also love to see a more direct measure of choice for those stimuli that show variable behavior (e.g., a choice probability analysis or something of the like would seem to be easily applied to the target SNRs of -5 and 0 dB); and compare task engaged activity of hits vs misses vs passive listening to those same stimuli. You discuss previous studies looking at choice-related/decision-related activity and draw parallels to this work-given that there is the opportunity with this data set to *directly* assess choice-related activity, the absence of such an analysis seems like a missed opportunity.

      Thank you for this feedback. We agree that “task engagement” is not a unimodal state and that a more fine-grained analysis of task-engaged neural activity, according to behavioral choice, could be informative.

      First, we would like to point out that in Figure 4 we did already compare behavioral d’ to delta neural d’. We found that the two were significantly correlated in dPEG, but not in A1. This suggests that task-dependent changes in stimulus decoding in dPEG, but not A1, are predictive of behavioral performance. This is consistent with the finding that task-relevant stimulus representations were selectively enhanced in dPEG, but not in A1.

      Second, we added a choice decoding analysis to address whether auditory cortex represents the animal’s choice in our task. The results of this analysis are summarized in Supplemental Figure 8 and are discussed under the results section: “Behavioral performance is correlated with neural coding changes in non-primary auditory cortex only.” (L. 226):

      “The previous analysis suggests that the task-dependent increase in stimulus information present in dPEG population activity is predictive of overall task performance. Next, we asked whether the population activity in either brain region was directly predictive of behavioral choice on single hit vs. miss trials. To do this, we conducted a choice probability analysis (Methods). We found that in both brain regions choice could be decoded well above chance level (Supplemental Figure 8). Choice information was present throughout the entire trial and did not increase during the target stimulus presentation. This suggests that the difference in population activity primarily reflects a cognitive state associated with the probability of licking on a given trial, or “impulsivity” rather than “choice.” This interpretation is consistent with our finding that baseline pupil size on each trial is predictive of trial outcome (Supplemental Figure 6b).”

      To keep our decoding approach consistent throughout the manuscript, we followed the same approach for choice decoding as we did for stimulus decoding (perform dDR then calculate neural d-prime in the dimensionality reduced space). To make the results more interpretable, we converted choice d-prime to a choice probability (percent correctly decoded choices) using leave-one-out cross validation. (We note that d-prime and percent correct are very highly correlated statistics.) This is described in the methods as follows (L. 550):

      “We performed a choice decoding analysis on hit vs. miss trials. We followed the same procedure as described above for stimulus decoding, where instead of a pair of stimuli our two classes to be decoded were “hit trial” vs. “miss trial”. That is, for each target stimulus we computed the optimal linear discrimination axis separating hit vs. miss trials (Abbott and Dayan, 1999) in the reduced dimensionality space identified with dDR (Heller and David, 2022). For the sake of interpretability with respect to previous work we reported choice probability as the percentage of correctly decoded trial outcomes rather than d-prime. Percent correct was calculated by projecting the population activity onto the optimal discrimination axis and using leave-one-out cross validation to measure the number of correct classifications.”

      (4) It would also be interesting to look at population coding across sessions (although the point is taken that within a session allows the opportunity to assess covariability). Minorly self-servingly but very much related to the above point, Christison-Lagay et al, 2017 employed a similar detect-in-noise task, analyzed single neurons and population level activity, and looked at putative choice-related activity. The current study has the opportunity to expand on that kind of analysis that much more by looking across multiple sites vs within a given recording site; and compare across regions.

      Thank you for highlighting this point, we agree that it is important. When studying population coding it is critical to consider the impact of covariability between neurons. Therefore, it is worthwhile to revisit our interpretations of prior results, e.g., Christison-Lagay et al, 2017, which studied population coding by combining neurons across different sessions, given that we now have access to simultaneously recorded population data.

      First, we would like to point out that this was the primary motivation for our simulation analyses presented in Figure 5. Using simulations, we found that task-dependent gain modulation (which can be observed across sessions) was sufficient to explain our primary finding – selective enhancement in decoding of behaviorally relevant sound stimuli in dPEG.

      Second, to address the question about how covariability affects choice-related information in auditory cortex and compare our findings with prior studies, we performed the same set of simulations for choice probability analysis. We found that, again, choice-dependent gain modulation was sufficient to explain our findings. That is, simulations with hit- vs. miss-dependent gain changes, but fixed covariability, closely mirrored the choice probability we observed in the raw data. An additional simulation where covariability between all neurons was set to zero also recapitulated our findings in the raw data. Collectively, this suggests that covariability does not play a significant role in shaping the choice information present in A1 and dPEG during this task. We have added the following text to the manuscript to summarize this finding (L. 293):

      “Finally, we used the same simulation approach to determine what aspects of population activity carry the “choice” related information we observed in A1 and dPEG (Figure 4 – figure supplement 1). Similar to our findings for stimulus decoding, we found that gain modulation alone was sufficient to recapitulate the choice information present in the raw data for this task. This helps frame prior work that pooled neurons across sessions to study population coding of choice in similar auditory discrimination tasks (Christison-Lagay et al, 2017).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study introduces and validates the Cyclic Homogeneous Oscillation (CHO) detection method to precisely determine the duration, location, and fundamental frequency of non-sinusoidal neural oscillations. Traditional spectral analysis methods face challenges in distinguishing the fundamental frequency of non-sinusoidal oscillations from their harmonics, leading to potential inaccuracies. The authors implement an underexplored approach, using the auto-correlation structure to identify the characteristic frequency of an oscillation. By combining this strategy with existing time-frequency tools to identify when oscillations occur, the authors strive to solve outstanding challenges involving spurious harmonic peaks detected in time-frequency representations. Empirical tests using electrocorticographic (ECoG) and electroencephalographic (EEG) signals further support the efficacy of CHO in detecting neural oscillations.

      Response:  We thank the reviewer for recognizing the strengths of our method in this encouraging review and for the opportunity to further improve and finalize our manuscript.

      Strengths:

      (1) The paper puts an important emphasis on the 'identity' question of oscillatory identification. The field primarily identifies oscillations through frequency, space (brain region), and time (length, and relative to task or rest). However, more tools that claim to further characterize oscillations by their defining/identifying traits are needed, in addition to data-driven studies about what the identifiable traits of neural oscillations are beyond frequency, location, and time. Such tools are useful for potentially distinguishing between circuit mechanistic generators underlying signals that may not otherwise be distinguished. This paper states this problem well and puts forth a new type of objective for neural signal processing methods.

      Response:  We sincerely appreciate this encouraging summary of the objective of our manuscript.

      (2) The paper uses synthetic data and multimodal recordings at multiple scales to validate the tool, suggesting CHO's robustness and applicability in various real-data scenarios. The figures illustratively demonstrate how CHO works on such synthetic and real examples, depicting in both time and frequency domains. The synthetic data are well-designed, and capable of producing transient oscillatory bursts with non-sinusoidal characteristics within 1/f noise. Using both non-invasive and invasive signals exposes CHO to conditions which may differ in extent and quality of the harmonic signal structure. An interesting followup question is whether the utility demonstrated here holds for MEG signals, as well as source-reconstructed signals from non-invasive recordings.

      Response:  We thank the reviewer for this excellent suggestion.  Indeed, our next paper will focus on applying our CHO method to signals that were source-reconstructed from non-invasive recordings (e.g., MEG and EEG) to extract their periodic activity.

      (3) This study is accompanied by open-source code and data for use by the community.

      Response:  We thank the reviewer for recognizing our effort to widely disseminate our method to the broader community.

      Weaknesses:

      (1) Due to the proliferation of neural signal processing techniques that have been designed to tackle issues such as harmonic activity, transient and event-like oscillations, and non-sinusoidal waveforms, it is naturally difficult for every introduction of a new tool to include exhaustive comparisons of all others. Here, some additional comparisons may be considered for the sake of context, a selection of which follows, biased by the previous exposure of this reviewer. One emerging approach that may be considered is known as state-space models with oscillatory and autoregressive components (Matsuda 2017, Beck 2022). State-space models such as autoregressive models have long been used to estimate the auto-correlation structure of a signal. State-space oscillators have recently been applied to transient oscillations such as sleep spindles (He 2023). Therefore, state-space oscillators extended with auto-regressive components may be able to perform the functions of the present tool through different means by circumventing the need to identify them in time-frequency. Another tool that should be mentioned is called PAPTO (Brady 2022). Although PAPTO does not address harmonics, it detects oscillatory events in the presence of 1/f background activity. Lastly, empirical mode decomposition (EMD) approaches have been studied in the context of neural harmonics and nonsinusoidal activity (Quinn 2021, Fabus 2022). EMD has an intrinsic relationship with extrema finding, in contrast with the present technique. In summary, the existence of methods such as PAPTO shows that researchers are converging on similar approaches to tackle similar problems. The existence of time-domain approaches such as state-space oscillators and EMD indicates that the field of timeseries analysis may yield even more approaches that are conceptually distinct and may theoretically circumvent the methodology of this tool.

      Response:  We thank the reviewer for this valuable insight.  In our manuscript, we acknowledge emerging approaches that employ state-space models or EMD for time-frequency analysis.  However, it's crucial to clarify that the primary focus in our study is on the detection and identification of the fundamental frequency, as well as the onset/offset of non-sinusoidal neural oscillations.  Thus, our emphasis lies specifically on these aspects.  We hope that future studies will use our methods as the basis to develop better methods for time-frequency analysis that will lead to a deeper understanding of harmonic structures.  

      Our Limitation section is addressing this issue.  Specifically, we recognize that a more sophisticated time-frequency analysis could contribute to improved sensitivity and that the core claim of our study is centered around the concept of increasing specificity in the detection of non-sinusoidal oscillations.  We hope that future studies will use this as a basis for improving time-frequency analysis in general.  Notably, our open-source code will greatly enable these future studies in this endeavor.  Specifically, in the first step of our algorithm, the timefrequency estimation can be replaced with any other preferred time-frequency analysis, such as state-space models, EMD, Wavelet transform, Gabor transform, and Matching Pursuit. 

      For our own follow-up study, we plan to conduct a thorough review and comparison of emerging approaches employing state-space models or EMD for time-frequency analysis.  In this study, we aim to identify which approach, including the six methods mentioned by the reviewer (Matsuda 2017, Beck 2022, He 2023, Brady 2022, Quinn 2021, and Fabus 2022), can maximize the estimation of the fundamental frequency of non-sinusoidal neural oscillations using CHO.  The insights provided by the reviewer are appreciated, and we will carefully consider these aspects in our follow-up study.  

      In the revision of this manuscript, we are setting the stage for these future studies.  Specifically, we added a discussion paragraph within the Limitation section about the state-space model, and EMD approaches:

      “However, because our CHO method is modular, the FFT-based time-frequency analysis can be replaced with more sophisticated time-frequency estimation methods to improve the sensitivity of neural oscillation detection.  Specifically, a state-space model (Matsuda 2017, Beck 2022, He 2023, Brady 2022) or empirical mode decomposition (EMD, Quinn 2021, Fabus 2022) may improve the estimation of the auto-correlation of the harmonic structure underlying nonsinusoidal oscillations.  Furthermore, a Gabor transform or matching pursuit-based approach may improve the onset/offset detection of short burst-like neural oscillations (Kus 2013 and Morales 2022).”

      (2) The criteria that the authors use for neural oscillations embody some operating assumptions underlying their characteristics, perhaps informed by immediate use cases intended by the authors (e.g., hippocampal bursts). The extent to which these assumptions hold in all circumstances should be investigated. For instance, the notion of consistent auto-correlation breaks down in scenarios where instantaneous frequency fluctuates significantly at the scale of a few cycles. Imagine an alpha-beta complex without harmonics (Jones 2009). If oscillations change phase position within a timeframe of a few cycles, it would be difficult for a single peak in the auto-correlation structure to elucidate the complex time-varying peak frequency in a dynamic fashion. Likewise, it is unclear whether bounding boxes with a pre-specified overlap can capture complexes that maneuver across peak frequencies.

      Response:  We thank the reviewer for this valuable insight into the methodological limitations in the detection of neural oscillations that exhibit significant fluctuations in their instantaneous frequency.  Indeed, our CHO method is also limited in the ability to detect oscillations with fluctuating instantaneous frequencies.  This is because CHO uses an auto-correlation-based approach to detect neural oscillations that exhibit two or more cycles.  If oscillations change phase position within a timeframe of a few cycles, CHO cannot detect the oscillation because the periodicity is not expressed within the auto-correlation.  This limitation can be partially overcome by relaxing the detection threshold (see Line 30 of Algorithm 1 in the revised manuscript) for the auto-correlation analysis.  However, relaxing the detection threshold, in consequence, increases the probability of detecting other aperiodic activity as well. To clarify how CHO determines the periodicity of oscillations, and to educate the reader about the tradeoff between detecting oscillations with fluctuating instantaneous frequencies and avoiding detecting other aperiod activity, we have added pseudo code and a new subsection in the Methods.

      Author response table 1.

      Algorithm 1

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (3) Related to the last item, this method appears to lack implementation of statistical inferential techniques for estimating and interpreting auto-correlation and spectral structure. In standard practice, auto-correlation functions and spectral measures can be subjected to statistical inference to establish confidence intervals, often helping to determine the significance of the estimates. Doing so would be useful for expressing the likelihood that an oscillation and its harmonic has the same autocorrelation structure and fundamental frequency, or more robustly identifying harmonic peaks in the presence of spectral noise. Here, the authors appear to use auto-correlation and time-frequency decomposition more as a deterministic tool rather than an inferential one. Overall, an inferential approach would help differentiate between true effects and those that might spuriously occur due to the nature of the data. Ultimately, a more statistically principled approach might estimate harmonic structure in the presence of noise in a unified manner transmitted throughout the methodological steps.

      Response:  We thank the reviewer for sharing this insight on further enhancing our method.  Indeed, CHO does not make use of statistical inferential statistics to estimate and interpret the auto-correlation and underlying spectral structure of the neural oscillation.  Implementing this approach within CHO would require calculating phase-phase coupling across all cross-frequency bands and bounding boxes.  However, as mentioned in the introduction section and Figure 1GL, phase-phase coupling analysis cannot fully ascertain whether the oscillations are phaselocked and thus are harmonics or, indeed, independent oscillations.  This ambiguity, combined with the exorbitant computational complexity of the entailed permutation test and the requirement to perform the analysis across all cross-frequency bands, channels, and trials, makes phase-phase coupling impracticable in determining the fundamental frequency of neural oscillations in real-time and, thus, the use in closed-loop neuromodulation applications.  Thus, within our study, we prioritized determining the fundamental frequency without considering the structure of harmonics.  

      An inferential approach can be implemented by adjusting the significance threshold that selects positive peaks within the auto-correlation of the signal.  Currently, this threshold is set to represent the approximate confidence bounds of the periodicity of the fundamental frequency.  To clarify this issue, we added additional pseudo code and a new subsection, titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO,” in the Methods section.

      In future studies, we will investigate the harmonic structure of neural oscillations based on a large data set.  This exploration will help us understand how non-sinusoidal properties may influence the harmonic structure.  Your input is highly appreciated, and we will diligently incorporate these considerations into our research.

      See Author response table 1.

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (4) As with any signal processing method, hyperparameters and their ability to be tuned by the user need to be clearly acknowledged, as they impact the robustness and reproducibility of the method. Here, some of the hyperparameters appear to be: a) number of cycles around which to construct bounding boxes and b) overlap percentage of bounding boxes for grouping. Any others should be highlighted by the authors and clearly explained during the course of tool dissemination to the community, ideally in tutorial format through the Github repository.

      Response:  We thank the reviewer for this helpful suggestion.  In response, we added a new subsection that describes the hyper-parameters of CHO as follows:

      A new subsection named “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO”.

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also reduces sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1.  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

      (5) Most of the validation demonstrations in this paper depict the detection capabilities of CHO. For example, the authors demonstrate how to use this tool to reduce false detection of oscillations made up of harmonic activity and show in simulated examples how CHO performs compared to other methods in detection specificity, sensitivity, and accuracy. However, the detection problem is not the same as the 'identity' problem that the paper originally introduced CHO to solve. That is, detecting a non-sinusoidal oscillation well does not help define or characterize its non-sinusoidal 'fingerprint'. An example problem to set up this question is: if there are multiple oscillations at the same base frequency in a dataset, how can their differing harmonic structure be used to distinguish them from each other? To address this at a minimum, Figure 4 (or a followup to it) should simulate signals at similar levels of detectability with different 'identities' (i.e. different levels and/or manifestations of harmonic structure), and evaluate CHO's potential ability to distinguish or cluster them from each other. Then, does a real-world dataset or neuroscientific problem exist in which a similar sort of exercise can be conducted and validated in some way? If the "what" question is to be sufficiently addressed by this tool, then this type of task should be within the scope of its capabilities, and validation within this scenario should be demonstrated in the paper. This is the most fundamental limitation at the paper's current state.

      Response: Thank you for your insightful suggestion; we truly appreciate it. We recognize that the 'identity' problem requires further studies to develop appropriate methods. Our current approach does not fully address this issue, as it may detect asymmetric non-sinusoidal oscillations with multiple harmonic peaks, without accounting for different shapes of nonsinusoidal oscillations.

      The main reason we could not fully address the “identity” problem results from the general absence of a defined ground truth, i.e., data for which we know the harmonic structure. To overcome this barrier, we would need datasets from well-characterized cognitive tasks or neural disorders.  For example, Cole et al. 2017 showed that the harmonic structure of beta oscillations can explain the degree of Parkinson’s disease, and Hu et al. 2023 showed that the number of harmonic peaks can localize the seizure onset zone. Future studies could use the data from these two studies to study whether CHO can distinguish different harmonic structures of pathological neural oscillations.

      In this paper, we showed the basic identity of neural oscillations, encompassing elements such as the fundamental frequency and onset/offset. Your valuable insights contribute significantly to our ongoing efforts, and we appreciate your thoughtful consideration of these aspects. In response, we added a new paragraph in the Limitation of the discussion section as below:

      “Another limitation of this study is that it does not assess the harmonic structure of neural oscillations. Thus, CHO cannot distinguish between oscillations that have the same fundamental frequency but differ in their non-sinusoidal properties.  This limitation stems from the objective of this study, which is to identify the fundamental frequency of non-sinusoidal neural oscillations.  Overcoming this limitation requires further studies to improve CHO to distinguish between different non-sinusoidal properties of pathological neural oscillations.  The data that is necessary for these further studies could be obtained from the wide range of studies that have linked the harmonic structures in the neural oscillations to various cognitive functions (van Dijk et al., 2010; Schalk, 2015; Mazaheri and Jensen, 2008) and neural disorders (Cole et al., 2017; Jackson et al., 2019; Hu et al., 2023). For example, Cole et al. 2017 showed that a harmonic structure of beta oscillations can explain the degree of Parkinson’s disease, and Hu et al. 2023 showed the number of harmonic peaks can localize the seizure onset zone. “

      References:

      Beck AM, He M, Gutierrez R, Purdon PL. An iterative search algorithm to identify oscillatory dynamics in neurophysiological time series. bioRxiv. 2022. p. 2022.10.30.514422.

      doi:10.1101/2022.10.30.514422

      Brady B, Bardouille T. Periodic/Aperiodic parameterization of transient oscillations (PAPTO)Implications for healthy ageing. Neuroimage. 2022;251: 118974.

      Fabus MS, Woolrich MW, Warnaby CW, Quinn AJ. Understanding Harmonic Structures Through Instantaneous Frequency. IEEE Open J Signal Process. 2022;3: 320-334.

      Jones SR, Pritchett DL, Sikora MA, Stufflebeam SM, Hämäläinen M, Moore CI. Quantitative analysis and biophysically realistic neural modeling of the MEG mu rhythm: rhythmogenesis and modulation of sensory-evoked responses. J Neurophysiol. 2009;102: 3554-3572.

      He M, Das P, Hotan G, Purdon PL. Switching state-space modeling of neural signal dynamics. PLoS Comput Biol. 2023;19: e1011395.

      Matsuda T, Komaki F. Time Series Decomposition into Oscillation Components and Phase Estimation. Neural Comput. 2017;29: 332-367.

      Quinn AJ, Lopes-Dos-Santos V, Huang N, Liang W-K, Juan C-H, Yeh J-R, et al. Within-cycle instantaneous frequency profiles report oscillatory waveform dynamics. J Neurophysiol. 2021;126: 1190-1208.

      Reviewer #2 (Public Review):

      Summary:

      A new toolbox is presented that builds on previous toolboxes to distinguish between real and spurious oscillatory activity, which can be induced by non-sinusoidal waveshapes. Whilst there are many toolboxes that help to distinguish between 1/f noise and oscillations, not many tools are available that help to distinguish true oscillatory activity from spurious oscillatory activity induced in harmonics of the fundamental frequency by non-sinusoidal waveshapes. The authors present a new algorithm which is based on autocorrelation to separate real from spurious oscillatory activity. The algorithm is extensively validated using synthetic (simulated) data, and various empirical datasets from EEG, intracranial EEG in various locations and domains (i.e. auditory cortex, hippocampus, etc.).

      Strengths:

      Distinguishing real from spurious oscillatory activity due to non-sinusoidal waveshapes is an issue that has plagued the field for quite a long time. The presented toolbox addresses this fundamental problem which will be of great use for the community. The paper is written in a very accessible and clear way so that readers less familiar with the intricacies of Fourier transform and signal processing will also be able to follow it. A particular strength is the broad validation of the toolbox, using synthetic, scalp EEG, EcoG, and stereotactic EEG in various locations and paradigms.

      Weaknesses:

      At many parts in the results section critical statistical comparisons are missing (e.g. FOOOF vs CHO). Another weakness concerns the methods part which only superficially describes the algorithm. Finally, a weakness is that the algorithm seems to be quite conservative in identifying oscillatory activity which may render it only useful for analysing very strong oscillatory signals (i.e.

      alpha), but less suitable for weaker oscillatory signals (i.e. gamma).

      Response: We thank Reviewer #2 for the assistance in improving this manuscript.  In the revised manuscript, we have added the missing statistical comparisons, detailed pseudo code, and a subsection that explains the hyper-parameters of CHO.  We also recognize the limitations of CHO in detecting gamma oscillations.  While our results demonstrate beta-band oscillations in ECoG and EEG signals (see Figures 5 and 6), we had no expectation to find gamma-band oscillations during a simple reaction time task.  This is because of the general absence of ECoG electrodes over the occipital cortex, where such gamma-band oscillations may be found. 

      Nevertheless, our CHO method should be able to detect gamma-band oscillations.  This is because if there are gamma-band oscillations, they will be reflected as a bump over the 1/f fit in the power spectrum, and CHO will detect them.  We apologize for not specifying the frequency range of the synthetic non-sinusoidal oscillations.  The gamma band was also included in our simulation. We added the frequency range (1-40 Hz) of the synthetic nonsinusoidal oscillations in the subsection, the caption of Figure 4, and the result section.

      Reviewer #1 (Recommendations For The Authors):

      (1) The example of a sinusoidal neural oscillation in Fig 1 seems to still exhibit a great deal of nonsinusoidal behavior. Although it is largely symmetrical, it has significant peak-trough symmetry as well as sharper peak structure than typical sinusoidal activity. Nevertheless, it has less harmonic structure than the example on the left. A more precisely-stated claim might be that non-sinusoidal behavior is not the distinguishing characteristic between the two, but rather the degree of harmonic structure.

      Response: We are grateful for this thoughtful observation. In response, we now recognize that the depicted example showcases pronounced peak-trough symmetry and sharpness, characteristics that might not be typically associated with sinusoidal behavior. We now better understand that the key differentiator between the examples lies not only in their nonsinusoidal behavior but also in their harmonic structure. To reflect this better understanding, we have refined our manuscript to more accurately articulate the differences in harmonic structure, in accordance with your suggestion. Specifically, we revised the caption of Fig 1 in the manuscript as follows:

      The caption of the Fig 1G-L.

      “We applied the same statistical test to a more sinusoidal neural oscillation (G). Since this neural oscillation more closely resembles a sinusoidal shape, it does not exhibit any prominent harmonic peaks in the alpha and beta bands within the power spectrum (H) and time-frequency domain (I).  Consequently, our test found that the phase of the theta-band and beta-band oscillations were not phase-locked (J-L).  Thus, this statistical test suggests the absence of a harmonic structure.”

      (2) The statement "This suggests that most of the beta oscillations

      detected by conventional methods are simply harmonics of the predominant asymmetric alpha oscillation." is potentially overstated. It is important to constrain this statement to the auditory cortex in which the authors conduct the validation, because true beta still exists elsewhere. The same goes for the beta-gamma claim later on. In general, use of "may be" is also more advisable than the definitive "are".

      Response: We thank the reviewer for this thoughtful feedback. To avoid the potential overstatement of our findings we revised our statement on beta oscillations in the manuscript as follows:

      Discussion:

      “This suggests that most of the beta oscillations detected by conventional methods within auditory cortex may be simply harmonics of the predominant asymmetric alpha oscillation.”

      Reviewer #2 (Recommendations For The Authors):

      All my concerns are medium to minor and I list them as they appear in the manuscript. I do not suggest new experiments or a change in the results, instead I focus on writing issues only.

      a) Line 50: A reference to the seminal paper by Klimesch et al (2007) on alpha oscillations and inhibition would seem appropriate here.

      Response: We added the reference to Klimesch et al. (2007).

      b) Figure 4: It is unclear which length for the simulated oscillations was used to generate the data in panels B-G.

      Response: We generated oscillations that were 2.5 cycles in length and 1-3 seconds in duration. We added this information to the manuscript as follows.

      Figure 4:

      “We evaluated CHO by verifying its specificity, sensitivity, and accuracy in detecting the fundamental frequency of non-sinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise.”

      Results (page 5, lines 163-165):

      “To determine the specificity and sensitivity of CHO in detecting neural oscillations, we applied CHO to synthetic non-sinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise, also known as pink noise, which has a power spectral density that is inversely proportional to the frequency of the signal.”

      Methods (page 20, lines 623-626):

      “While empirical physiological signals are most appropriate for validating our method, they generally lack the necessary ground truth to characterize neural oscillation with sinusoidal or non-sinusoidal properties. To overcome this limitation, we first validated CHO on synthetic nonsinusoidal oscillatory bursts (2.5 cycles, 1–3 seconds long) convolved with 1/f noise to test the performance of the proposed method.”

      c) Figure 5 - supplements: Would be good to re-organize the arrangement of the plots on these figures to facilitate the comparison between Foof and CHO (i.e. by presenting for each participant FOOOF and CHO together).

      Response: We combined Figure 5-supplementary figures 1 and 2 into Figure 5-supplementary figure 1, Figure 6-supplementary figures 1 and 2 into Figure 6-supplementary figure 1, and Figure 8-supplementary figures 1 and 2 into Figure 8-supplementary figure 1. 

      Author response image 1.

      Figure 5-supplementary figure 1:

      Author response image 2.

      Figure 6-supplementary figure 1:

      Author response image 3.

      Figure 8-supplementary figure 1:

      d) Statistics: Almost throughout the results section where the empirical results are described statistical comparisons are missing. For instance, in lines 212-213 the statement that CHO did not detect low gamma while FOOOF did is not backed up by the appropriate statistics. This issue is also evident in all of the following sections (i.e. EEG results, On-offsets of oscillations, SEEG results, Frequency and duration of oscillations). I feel this is probably the most important point that needs to be addressed.

      Response: We added statistical comparisons to Figure 5 (ECoG), 6 (EEG), and the results section as follows.

      Author response image 4.

      Validation of CHO in detecting oscillations in ECoG signals. A. We applied CHO and FOOOF to determine the fundamental frequency of oscillations from ECoG signals recorded during the pre-stimulus period of an auditory reaction time task. FOOOF detected oscillations primarily in the alpha- and beta-band over STG and pre-motor area.  In contrast, CHO also detected alpha-band oscillations primarily within STG, and more focal beta-band oscillations over the pre-motor area, but not STG. B. We investigated the occurrence of each oscillation within defined cerebral regions across eight ECoG subjects. The horizontal bars and horizontal lines represent the median and median absolute deviation (MAD) of oscillations occurring across the eight subjects. An asterisk (*) indicates statistically significant differences in oscillation detection between CHO and FOOOF (Wilcoxon rank-sum test, p<0.05 after Bonferroni correction).”

      Author response image 5.

      Validation of CHO in detecting oscillations in EEG signals. A. We applied CHO and FOOOF to determine the fundamental frequency of oscillations from EEG signals recorded during the pre-stimulus period of an auditory reaction time task.  FOOOF primarily detected alpha-band oscillations over frontal/visual areas and beta-band oscillations across all areas (with a focus on central areas). In contrast, CHO detected alpha-band oscillations primarily within visual areas and detected more focal beta-band oscillations over the pre-motor area, similar to the ECoG results shown in Figure 5. B. We investigated the occurrence of each oscillation within the EEG signals across seven subjects. An asterisk (*) indicates statistically significant differences in oscillation detection between CHO and FOOOF (Wilcoxon rank-sum test, p<0.05 after Bonferroni correction). CHO exhibited lower entropy values of alpha and beta occurrence than FOOOF across 64 channels. C. We compared the performance of FOOO and CHO in detecting oscillation across visual and pre-motor-related EEG channels. CHO detected more alpha and beta oscillations in visual cortex than in pre-motor cortex. FOOOF detected alpha and beta oscillations in visual cortex than in pre-motor cortex.

      We added additional explanations of our statistical results to the “Electrocorticographic (ECoG) results” and “Electroencephalographic (EEG) results” sections.

      “We compared neural oscillation detection rates between CHO and FOOOF across eight ECoG subjects.  We used FreeSurfer to determine the associated cerebral region for each ECoG location. Each subject performed approximately 400 trials of a simple auditory reaction-time task.  We analyzed the neural oscillations during the 1.5-second-long pre-stimulus period within each trial. CHO and FOOOF demonstrated statistically comparable results in the theta and alpha bands despite CHO exhibiting smaller median occurrence rates than FOOOF across eight subjects. Notably, within the beta band, excluding specific regions such as precentral, pars opercularis, and caudal middle frontal areas, CHO's beta oscillation detection rate was significantly lower than that of FOOOF (Wilcoxon rank-sum test, p < 0.05 after Bonferroni correction). This suggests comparable detection rates between CHO and FOOOF in premotor and Broca's areas, while the detection of beta oscillations by FOOOF in other regions, such as the temporal area, may represent harmonics of theta or alpha, as illustrated in Figure 5A and B. Furthermore, FOOOF exhibited a higher sensitivity in detecting delta, theta, and low gamma oscillations overall, although both CHO and FOOOF detected only a limited number of oscillations in these frequency bands.”

      “We assessed the difference in neural oscillation detection performance between CHO and FOOOF across seven EEG subjects.  We used EEG electrode locations according to the 10-10 electrode system and assigned each electrode to the appropriate underlying cortex (e.g., O1 and O2 for the visual cortex). Each subject performed 200 trials of a simple auditory reaction-time task.  We analyzed the neural oscillations during the 1.5-second-long pre-stimulus period. In the alpha band, CHO and FOOOF presented statistically comparable outcomes. However, CHO exhibited a greater alpha detection rate for the visual cortex than for the pre-motor cortex, as shown in Figures 6B and C. The entropy of CHO's alpha oscillation occurrences (3.82) was lower than that of FOOOF (4.15), with a maximal entropy across 64 electrodes of 4.16. Furthermore, in the beta band, CHO's entropy (4.05) was smaller than that of FOOOF (4.15). These findings suggest that CHO may offer a more region-specific oscillation detection than FOOOF.

      As illustrated in Figure 6C, CHO found fewer alpha oscillations in pre-motor cortex (FC2 and FC4) than in occipital cortex (O1 and O2), while FOOOF found more beta oscillations occurrences in pre-motor cortex (FC2 and FC4) than in occipital cortex. However, FOOOF found more alpha and beta oscillations in visual cortex than in pre-motor cortex.

      Consistent with ECoG results, FOOOF demonstrated heightened sensitivity in detecting delta, theta, and low gamma oscillations. 

      Nonetheless, both CHO and FOOOF identified only a limited number of oscillations in delta and theta frequency bands.

      Contrary to the ECoG results, FOOOF found more low gamma oscillations in EEG subjects than in ECoG subjects.”

      e) Line 248: The authors find an oscillatory signal in the hippocampus with a frequency at around 8 Hz, which they refer to as alpha. However, several researchers (including myself) may label this fast theta, according to the previous work showing the presence of fast and slow theta oscillations in the human hippocampus (https://pubmed.ncbi.nlm.nih.gov/21538660/, https://pubmed.ncbi.nlm.nih.gov/32424312/).

      Response: We replaced “alpha” with “fast theta” in the figure and text. We added a citation for Lega et al. 2012.

      f) Line 332: It could also be possible that the auditory alpha rhythms don’t show up in the EEG because a referencing method was used that was not ideal for picking it up. In general, re-referencing is an important preprocessing step that can make the EEG be more susceptible to deep or superficial sources and that should be taken into account when interpreting the data.

      Response: We re-referenced our signals using a common median reference (see Methods section). After close inspection of our results, we found that the EEG topography shown in Figure 6 did not show the auditory alpha oscillation because the alpha power of visual locations greatly exceeded that of those locations that reflect oscillations in the auditory cortex. Further, while our statistical analysis shows that CHO detected auditory alpha oscillations, this analysis also shows that CHO detected significantly more visual alpha oscillations.

      g) Line 463: It seems that the major limitation of the algorithm lies in its low sensitivity which is discussed by the authors. The authors seem to downplay this a bit by saying that the algorithm works just fine at SNRs that are comparable to alpha oscillations. However, alpha is the strongest single in human EEG which may make the algorithm less suitable for picking up less prominent oscillatory signals, i.e. gamma, theta, ripples, etc. Is CHO only seeing the ‘tip of the iceberg’?

      Response:  We performed the suggested analysis. For the theta band, this analysis generated convincing statistical results for ECoG signals (Figures 5, 6, and the results section). For theta oscillation detection, we found no statistical difference between CHO and FOOOF.  Since FOOOF has a high sensitivity even under SNRs (as shown in our simulation), our analysis suggests that CHO and FOOOF should perform equally well in the detection of theta oscillation, even when the theta oscillation amplitude is small.

      To validate the ability of CHO to detect oscillations in high-frequency bands (> 40Hz), such as gamma oscillations and ripples, our follow-up study is applying CHO in the detection of highfrequency oscillations (HFOs) in electrocorticographic signals recorded during seizures.  To this end, our follow-up study analyzed 26 seizures from six patients.  In this analysis, CHO showed similar sensitivity and specificity as the epileptogenicity index (EI), which is the most commonly used method to detect seizure onset times and zones. The results of this follow-up study were presented at the American Epilepsy Society Meeting in December of 2023, and we are currently preparing a manuscript for submission to a peer-reviewed journal. 

      In this study, we want to investigate the performance of CHO in detecting the most prominent neural oscillations (e.g., alpha and beta). Future studies will investigate the performance of  CHO in detecting more difficult to observe oscillations (delta in sleep stages, theta in the hippocampus during memory tasks, and high-frequency oscillation or ripples in seizure or interictal data. 

      h) Methods: The methods section, especially the one describing the CHO algorithm, is lacking a lot of detail that one usually would like to see in order to rebuild the algorithm themselves. I appreciate that the code is available freely, but that does not, in my opinion, relief the authors of their duty to describe in detail how the algorithm works. This should be fixed before publishing.

      Response: We now present pseudo code to describe the algorithms within the new subsection on the hyper-parameterization of CHO.

      See Author response table 1.

      A new subsection titled “Tradeoffs in adjusting the hyper-parameters that govern the detection in CHO.”

      “The ability of CHO to detect neural oscillations and determine their fundamental frequency is governed by four principal hyper-parameters.  Adjusting these parameters requires understanding their effect on the sensitivity and specificity in the detection of neural oscillations. 

      The first hyper-parameter is the number of time windows (N in Line 5 in Algorithm 1), that is used to estimate the 1/f noise.  In our performance assessment of CHO, we used four time windows, resulting in estimation periods of 250 ms in duration for each 1/f spectrum.  A higher number of time windows results in smaller estimation periods and thus minimizes the likelihood of observing multiple neural oscillations within this time window, which otherwise could confound the 1/f estimation.  However, a higher number of time windows and, thus, smaller time estimation periods may lead to unstable 1/f estimates. 

      The second hyper-parameter defines the minimum number of cycles of a neural oscillation to be detected by CHO (see Line 23 in Algorithm 1).  In our study, we specified this parameter to be two cycles.  Increasing the number of cycles increases specificity, as it will reject spurious oscillations.  However, increasing the number also sensitivity as it will reject short oscillations.

      The third hyper-parameter is the significance threshold that selects positive peaks within the auto-correlation of the signal.  The magnitude of the peaks in the auto-correlation indicates the periodicity of the oscillations (see Line 26 in Algorithm 1).  Referred to as "NumSTD," this parameter denotes the number of standard errors that a positive peak has to exceed to be selected to be a true oscillation.  For this study, we set the "NumSTD" value to 1 (the approximate 68% confidence bounds).  Increasing the "NumSTD" value increases specificity in the detection as it reduces the detection of spurious peaks in the auto-correlation.  However, increasing the "NumSTD" value also decreases the sensitivity in the detection of neural oscillations with varying instantaneous oscillatory frequencies. 

      The fourth hyper-parameter is the percentage of overlap between two bounding boxes that trigger their merger (see Line 31 in Algorithm 1).  In our study, we set this parameter to 75% overlap.  Increasing this threshold yields more fragmentation in the detection of oscillations, while decreasing this threshold may reduce the accuracy in determining the onset and offset of neural oscillations.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors investigate the tolerance of aminoglycosides in E. coli mutants deleted in the Krebs cycle and respiratory chain enzymes. The motivation for this study is unclear. Transport of aminoglycosides is pmf-dependent, as the authors correctly note, and knocking out energy-producing components leads to tolerance of aminoglycosides, this has been well established. In S. aureus, clinically relevant "small colony" strains selected for in the course of therapy with aminoglycosides acquire null mutations in the biosynthesis of heme or ubiquinone, and have been studied in detail. In E. coli, such knockouts have not been reported in clinical isolates, probably due to severe fitness costs.

      Response: We sincerely appreciate the time and consideration the reviewer dedicated to evaluating our manuscript. It's important to highlight that while the transport of aminoglycosides is PMF-dependent, recent studies underscore the potential role of metabolic mutations in antibiotic tolerance, a facet that warrants further investigation. For instance, the study by Henimann’s and Michiels' groups explored genomic changes in E. coli strains (including uropathogenic UTI89 strains) subjected to daily antibiotic exposure (Van den Bergh et al., 2022). Notably, mutations predominantly occurred in genes of the nuo operon, a key component of E. coli energy metabolism, suggesting a link between metabolic adaptations and antibiotic tolerance. Furthermore, the research by Collin's group revealed previously unrecognized genes related to central metabolism (e.g., icd, gltD, sucA) that contribute to antibiotic resistance in E. coli cells exposed to multiple antibiotics, including aminoglycosides (Lopatkin et al., 2021). These findings are corroborated by the presence of similar mutations in clinical E. coli pathogens, as evidenced by the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection (Lopatkin et al., 2021). The clinical relevance of metabolic mutations in antibiotic tolerance is increasingly recognized, yet their underlying mechanisms remain enigmatic. Therefore, elucidating the role of metabolic pathways in conferring antibiotic tolerance is highly critical. We have updated the introduction to clearly convey our motivation in this study (see page 4).

      At the same time, single-cell analysis has shown that individual cells with a decrease in the expression of Krebs cycle enzymes are tolerant of antibiotics and have lower ATP (Manuse et al., PLoS Biol 19: e3001194). The authors of the study under review report that knocking out ICD, isocitrate dehydrogenase that catalyzes the rate-limiting step in the Krebs cycle, has little effect on aminoglycoside tolerance and actually leads to an increase in the level of ATP over time. This observation does not seem to make much sense and contradicts previous reports, specifically that E. coli ICD is tolerant of antibiotics and, not surprisingly, produces Less ATP (Kabir and Shimizu, Appl Micro-biol Biotechnol. 2004; 65(1):84-96; Manuse et al., PLoS Biol 19: e3001194). Mutations in other Krebs cycle enzymes, unlike ICD, do lead to a dramatic increase in tolerance of aminoglycosides according to the paper under review. This is all very confusing.

      Response: Although our data cannot be directly compared to that of Kabir and Shimizu (Mohiuddin Kabir and Shimizu, 2004), due to the utilization of entirely different experimental procedures and measurement techniques, we can draw some parallels to the study conducted by Lewis’ group (Manuse et al., 2021), despite certain differences in experimental protocols. Furthermore, the reviewer has made strong assertions regarding our manuscript based on the findings of Lewis’ group. Thus, we believe it's pertinent to expand our response regarding that study.

      In the study of Lewis’ group, bacterial cells were inoculated at a ratio of 1:100 into LB medium from an overnight culture (approximately 16 hours). Subsequently, the cultures were incubated at 37°C for approximately 2 hours, and ATP levels were measured using the BacTiter Glo kit (Promega, Madison, WI, USA). ATP levels were then normalized to cell density, determined through optical density measurements, and represented on a linear diagram. As demonstrated in Supplementary Figure S1c of their paper, there was a 10-15% reduction in normalized ATP levels in the icd mutant compared to the wild type. In our experiments, cells were grown for 24 hours in overnight cultures, diluted 100-fold in fresh media, and ATP levels were measured at 3, 4, 5, and 6 hours using the same kit. ATP levels were normalized to cell counts quantified by flow cytometry. Upon analyzing our data of the icd mutant for around 3 hours (the time point closest to that of the study of Lewis’ group), we observed a reduction of approximately 15-20% (without statistical significance) in the icd mutant compared to the wild-type (see raw data, linear plot, and logarithmic plot below; Author response image 1), which aligns with the findings of Lewis’ group.

      We further investigated the gentamicin tolerance of both wild-type and icd mutant strains of E. coli BW25113 (Author response image 2). Our findings indicate that the increased sensitivity of the icd mutant of the MG1655 strain to gentamicin is similar to the observation in the other E. coli strain.

      Author response image 1.

      ATP levels in the icd mutant. ATP levels of both the mutant and wild-type strains were measured at t=3 hours of cell growth and normalized to cell counts. The figure presents the raw data (a), linear plot (b), and logarithmic plot (c) of the same dataset. This data corresponds to the first panel of Figure 3B in the manuscript.

      Author response image 2.

      Gentamicin tolerance of wild-type and icd mutant strains of E. coli BW25113. Both wild type and mutant strains were treated with gentamicin (50 µg/ml) for 5 hours at the mid-exponential phase. Cells were plated before and after treatment for CFU/ml counts. The dashed line represents the limit of detection. CFU: Colony forming units.

      We think that there are two primary reasons why our study cannot contradict the findings of the Lewis group:

      Firstly, our study cannot be directly compared to theirs, as they did not comprehensively explore the impact of gene deletions on cell metabolism beyond the measurement of ATP levels at a single time point (Manuse et al., 2021). Our study encompasses various metabolic parameters such as cellular ATP, redox status, proton motive force (PMF), intracellular pH, and drug uptake throughout the exponential and/or early stationary phase. Additionally, we conducted proteomic analysis for five different strains including mutants and wild type. Moreover, we performed pathway enrichment analysis grounded in the statistical background of the entire genome, encompassing various functional pathway classification frameworks such as Gene Ontology annotations, KEGG pathways, and Uniprot keywords. The results of these pathway enrichment analyses are now available in the Supplementary File (see Supplementary Tables 11-17 in the current manuscript). Thus, we believe it is unjust to deem our study contradictory compared to the Lewis group's study, which does not have a comprehensive analysis of the metabolism of the mutant strains they investigated.

      Secondly, our study cannot be compared to that specific study (Manuse et al., 2021) due to the utilization of a distinct antibiotic (ciprofloxacin). Cell tolerance is heavily reliant on the mechanism of action of the antibiotic used. Therefore, the reviewer should have focused on studies closely related to aminoglycoside tolerance. Our study is not confusing or contradictory, as Lewis’ group also demonstrated that the tolerance of the icd mutant to gentamicin was significantly reduced while the tolerance of other TCA cycle mutant strains was increased in a different study (Shan et al., 2015). However, they did not delve into the metabolism of these mutant strains, as we did. We now mention this point in our manuscript (see pages 14-15).

      Apart from the confusing data, it is not clear what useful information may be obtained from the choice of the experimental system. The authors examine exponentially growing cells of E. coli for tolerance of aminoglycosides. The population at this stage of growth is highly susceptible to aminoglycosides, and only some rare persister cells can survive. However, the authors do not study persisters. A stationary population of E. coli is tolerant of aminoglycosides, and this is clinically relevant, but this is not the subject of the study.

      Response: Respectfully, we must express our disagreement with the reviewer's comments. Our experimental system is meticulously organized and logically structured. Mutant strains such as gltA, sucA, and nuoI deletions exhibit increased tolerance to all aminoglycosides tested, with their fractions clearly increasing around the mid-exponential phase between 3-4 hours (refer to Figure 2B in our manuscript). This surge in tolerance is evident at the population level as well (as depicted in Figure 1A in our manuscript, where certain mutant strains demonstrate complete survival to streptomycin, with survival fractions nearing 1). Given the pronounced increase observed around the mid-exponential phase, we primarily characterize the metabolism of these cells during this growth phase.

      It's essential to note that any investigation into antibiotic tolerance and/or resistance holds immense significance, regardless of the growth phase under scrutiny, as antibiotic tolerance/resistance poses a substantial healthcare challenge. Additionally, metabolic mutant strains do not necessarily entail severe fitness costs, as evidenced by Figure S2A published by the Lewis group (Manuse et al., 2021), a finding consistent with our study (see Figure 2B in our manuscript). This phenomenon could confer a survival advantage to bacterial cells, as they may acquire metabolic mutations to bolster their tolerance without incurring significant fitness costs. Furthermore, numerous studies suggest that bacterial cells may opt for the evolutionary pathway leading to increased tolerance before acquiring resistance mechanisms (Levin-Reisman et al., 2017; Santi et al., 2021). The presence of metabolic mutations in clinical E. coli pathogens has also been confirmed through the analysis of a large library of 7243 E. coli genomes from NCBI Pathogen Detection by Collin’s group (Lopatkin et al., 2021). Consequently, comprehending the tolerance mechanisms of metabolic mutations holds paramount importance.

      References

      Levin-Reisman I, Ronin I, Gefen O, Braniss I, Shoresh N, Balaban NQ. 2017. Antibiotic tolerance facilitates the evolution of resistance. Science (1979) 355:826–830. doi:10.1126/science.aaj2191

      Lopatkin AJ, Bening SC, Manson AL, Stokes JM, Kohanski MA, Badran AH, Earl AM, Cheney NJ, Yang JH, Collins JJ. 2021. Clinically relevant mutations in core metabolic genes confer antibiotic resistance. Science (1979) 371. doi:10.1126/science.aba0862

      Manuse S, Shan Y, Canas-Duarte SJ, Bakshi S, Sun WS, Mori H, Paulsson J, Lewis K. 2021. Bacterial persisters are a stochastically formed subpopulation of low-energy cells. PLoS Biol 19. doi:10.1371/journal.pbio.3001194

      Mohiuddin Kabir M, Shimizu K. 2004. Metabolic regulation analysis of icd-gene knockout Escherichia coli based on 2D electrophoresis with MALDI-TOF mass spectrometry and enzyme activity measurements. Appl Microbiol Biotechnol 65:84–96. doi:10.1007/s00253-004-1627-1

      Santi I, Manfredi P, Maffei E, Egli A, Jenal U. 2021. Evolution of Antibiotic Tolerance Shapes Resistance Development in Chronic Pseudomonas aeruginosa Infections. doi:10.1128/mBio.03482-20

      Shan Y, Lazinski D, Rowe S, Camilli A, Lewis K. 2015. Genetic basis of persister tolerance to aminoglycosides in Escherichia coli. mBio 6. doi:10.1128/mBio.00078-15

      Van den Bergh B, Schramke H, Michiels JE, Kimkes TEP, Radzikowski JL, Schimpf J, Vedelaar SR, Burschel S, Dewachter L, Lončar N, Schmidt A, Meijer T, Fauvart M, Friedrich T, Michiels J, Heinemann M. 2022. Mutations in respiratory complex I promote antibiotic persistence through alterations in intracellular acidity and protein synthesis. Nat Commun 13:546. doi:10.1038/s41467-022-28141-x

      Reviewer #2 (Public Review):

      Summary:

      This interesting study challenges a dogma regarding the link between bacterial metabolism decrease and tolerance to aminoglycosides (AG). The authors demonstrate that mutants well-known for being tolerant to AG, such as those of complexes I and II, are not so due to a decrease in the proton motive force (PMF) and thus antibiotic uptake, as previously reported in the literature.

      Strengths:

      This is a complete study. These results are surprising and are based on various read-outs, such as ATP levels, pH measurement, membrane potential, and the uptake of fluorophore-labeled gentamicin. Utilizing a proteomic approach, the authors show instead that in tolerant mutants, there is a decrease in the levels of proteins associated with ribosomes (targets of AG), causing tolerance.

      Response: We sincerely appreciate the reviewer for taking the time to read our manuscript and offer valuable suggestions.

      Weaknesses:

      The use of a single high concentration of aminoglycoside: my main comment on this study concerns the use of an AG concentration well above the MIC (50 µg/ml or 25 µg/ml for uptake experiments), which is 10 times higher than previously used concentrations (Kohanski, Taber) in study showing a link with PMF. This significant difference may explain the discrepancies in results. Indeed, a high concentration of AG can mask the effects of a metabolic disruption and lead to less specific uptake. However, this concentration highlights a second molecular level of tolerance. Adding experiments using lower concentrations (we propose 5 µg/ml to compare with the literature) would provide a more comprehensive understanding of AG tolerance mechanisms during a decrease in metabolism.

      Another suggestion would be to test iron limitation (using an iron chelator as DIP), which has been shown to induce AG tolerance. Can the authors demonstrate if this iron limitation leads to a decrease in ribosomal proteins? This experiment would validate their hypothesis in the case of a positive result. Otherwise, it would help distinguish two types of molecular mechanisms for AG tolerance during a metabolic disruption: (i) PMF and uptake at low concentrations, (ii) ribosomal proteins at high concentrations.

      Response: While we acknowledge the intriguing possibility of exploring whether iron limitation results in a reduction of ribosomal proteins, we believe that this topic falls slightly outside the scope of our current study. This area warrants independent investigation since our current research did not specifically focus on iron-limited environments (LB medium is iron-rich, as referenced (Abdul-tehrani et al., 1999; Rodríguez-Rojas et al., 2015)). However, we fully concur with the notion that experimental outcomes may be contingent upon the concentration of aminoglycosides (AG). Hence, we repeated the critical experiments using a lower concentration of gentamicin (5 µg/mL), as suggested by the reviewer. Before delving into a discussion of these results, we wish to emphasize two key points. Firstly, the majority of our metabolic measurements, including ATP levels, redox activities, intracellular pH, and metabolomics, were conducted in mutant and wild-type cells in the absence of drugs. Our objective was to elucidate the impact of genetic perturbations of the TCA cycle on cell metabolism. Secondly, it's important to emphasize that our study does not invalidate the hypothesis that AG uptake is proton motive force (PMF)-dependent. We observed similar drug uptake across the strains tested, which is reasonable considering that their energy metabolism and PMF are not significantly altered compared to the wild type (at least we did not observe a consistent trend in their metabolic levels). Consequently, our study does not necessarily contradict with previous claims (Taber Harry W et al., 1987). We have now clarified this point in the manuscript (see pages 1 and 13).

      When we employed a lower gentamicin concentration, we still noted a significant elevation in tolerance among the gltA, sucA, and nuoI mutant strains compared to the wild type. Also, it remained evident that the observed tolerance in the mutant strains cannot be ascribed to differences in drug uptake or impaired PMF, as the levels of drug uptake and the disruption of PMF by gentamicin (at lower concentrations) in the mutant strains were comparable to those of the wild type. Moreover, since our metabolic measurements and proteomics analyses failed to reveal any notable alterations in energy metabolism in these strains, the consistency in drug uptake levels across both mutant and wild-type strains, even at lower concentrations, further bolsters the validity of our findings obtained at higher gentamicin concentrations. The new results have been incorporated into the Supplementary file (see Supplementary Figures S1, S5, S7, and S9) and discussed throughout the manuscript.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Line 120: Luria-Bertani (LB), used Lysogeny Broth.

      Line 180: "RSG dye can be reduced by bacterial reductases of PMF" to be reformulated.

      Response: The suggested corrections have been incorporated into the manuscript.

      References

      Abdul-tehrani H, Hudson AJ, Chang Y, Timms AR, Hawkins C, Williams JM, Harrison PM, Guest JR, Andrews SC. 1999. Ferritin Mutants of Escherichia coli Are Iron Deficient and Growth Impaired, and fur Mutants are Iron Deficient, Journal of Bacteriology.

      Rodríguez-Rojas A, Makarova O, Müller U, Rolff J. 2015. Cationic Peptides Facilitate Iron-induced Mutagenesis in Bacteria. PLoS Genet 11. doi:10.1371/journal.pgen.1005546

      Taber Harry W, Mueller JP, Miller PF, Arrow AS. 1987. Bacterial Uptake of Aminoglycoside Antibiotics. Microbiol Rev 51:439–457. doi:10.1128/mr.51.4.439-457.1987

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a solid and generally convincing set of experiments to address the question of whether the lateral parafacial area (pFL) is active in controlling active expiration, which is particularly important in patient populations that rely on active exhalation to maintain breathing (eg, COPD, ALS, muscular dystrophy). This study presents a valuable finding by pharmacologically mapping the core medullary region that contributes to active expiration and addresses the question of where these regions lie anatomically. Results from these experiments will be of value to those interested in the neural control of breathing and other neuroscientists as a framework for how to perform pharmacological mapping experiments in the future.

      Thanks for the positive feedback on our study, as well as the assessment of the novelty of our investigation and the advancements to the field that these results will bring in the future.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      The main focus of the current study is to identify the anatomical core of an expiratory oscillator in the medulla using pharmacological disinhibition. Although expiration is passive in normal eupneic conditions, activation of the parafacial (pFL) region is believed to evoke active expiration in conditions of elevated ventilatory demands. The authors and others in the field have previously attempted to map this region using pharmacological, optogenetic, and chemogenetic approaches, which present their own challenges.

      In the present study, the authors take a systematic approach to determine the precise anatomical location within the ventral medulla's rostrocaudal axis where the expiratory oscillator is located. The authors used a bicuculline (a GABA-A receptor antagonist) and fluorobeads solution at 5 distinct anatomical locations to study the effects on neuronal excitability and functional circuitry in the pFL. The effects of bicuculline on different phases of the respiratory cycle were characterized using a multidimensional cycle-by-cycle analysis. This analysis involved measuring the differences in airflow, diaphragm electromyography (EMG), and abdominal EMG signals, as well as using a phase-plane analysis to analyze the combined differences of these respiratory signals. Anatomical immunostaining techniques were also used to complement the functional mapping of the pFL.

      Major strengths of this work include a robust study design, complementary neurophysiological and immunohistochemical methods, and the use of a novel phase-plane analysis. The authors construct a comprehensive functional map revealing functional nuances in respiratory responses to bicuculline along the rostrocaudal axis of the parafacial region. They convincingly show that although bicuculline injections at all coordinates of the pFL generated an expiratory response, the most rostral locations in the lateral parafacial region play the strongest role in generating active expiration. These were characterized by a strong impact on the duration and strength of ABD activation and a robust change in tidal volume and minute ventilation. The authors also confirmed histologically that none of the injection sites overlapped grossly with PHOX2B+ neurons, thus confirming the specificity of the injections in the pFL and not the neighboring RTN.

      Collectively, these findings advance our understanding of the presumed expiratory oscillator, the pFL, and highlight the functional heterogeneity in the functional response of this anatomical structure.

      Thanks for the positive feedback on the results presented in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Pisanski and colleagues map regions of the brainstem that produce the rhythm for active expiratory breathing movements and influence their motor patterns. While the neural origins of inspiration are very well understood, the neural bases for expiration lag considerably. The problem is important and new knowledge pertaining to the neural origins of expiration is welcome.

      The authors perturb the parafacial lateral (pFL) respiratory group of the brainstem with microinjection of bicuculline, to elucidate how disinhibition in specific locations of the pFL influences active expiration (and breathing in general) in anesthetized rats. They provide valuable, if not definitive, evidence that the borders of the pFL appear to extend more rostrally than previously appreciated. Prior research suggests that the expiratory pFL exists at the caudal pole of the facial cranial nucleus (VIIc). Here, the authors show that its borders probably extend as much as 1 mm rostral to VIIc. The evidence is convincing albeit with caveats.

      Strengths:

      The authors achieve their aim in terms of showing that the borders of the expiratory pFL are not well understood at present and that it (the pFL) extends more rostrally. The results support that point. The data are strong enough to cause many respiratory neurobiologists to look at the sites rostral to the VIIc for expiratory rhythmogenic neurons and characterize their properties and mechanisms. At present my view is that most respiratory neurobiologists overlook the regions rostral to VIIc in their studies of expiratory rhythm and pattern.

      Weaknesses:

      The injection of bicuculline has indiscriminate effects on excitatory and inhibitory neurons, and the parafacial region is populated by excitatory neurons that are expiratory rhythmogenic and GABA and glycinergic neurons whose roles in producing active expiration are contradictory (Flor et al. J Physiol, 2020, DOI: 10.1113/JP280243). It remains unclear how the microinjections of bicuculline differentially affect all three populations. A more selective approach would be able to disinhibit the populations separately. Nevertheless, for the main point at hand, the data do suggest that we should reconsider the borders of the expiratory pFL nucleus and begin to examine its physiology up to 1 mm rostral to VIIc.

      The control experiment showed that bicuculline microinjections induced cFos expression in the pFL, which is good, but again we don't know which neurons were disinhibited: glutamatergic, GABAergic, or glycinergic.

      Thanks for sharing your excitement on the results of our study, and appreciating the thorough investigation performed with the use of bicuculline, an approach that was originally used in Pagliardini et al, 2011, PMID: 21414911) and then used by many other groups to generate and study active expiration in vivo.

      In the current study we used the well known effect of Bicuculline to systematically test the area that is more sensitive to such a pharmacological effect, and hence may be the core for generating active expiration. While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype. We can’t exclude that a small fraction of neurons in the pFL area are inhibitory, and that they could influence recruitment of adjacent late-E expiratory neurons. A more selective activation of neuronal populations with different phenotype would be indeed interesting, nonetheless, if local inhibitory neurons have a role in the generation of active expiration, then their disinhibition could have either an inhibitory effect on late-E activity or stimulate expiration in a more indirect fashion.

      While the effect of bicuculline on active expiration has been reported and replicated in multiple manuscripts, the source of inhibition across different phases of the respiratory cycle is still under investigation. Some studies suggest that GABAergic and glycinergic inhibition is not originated in pFL but rather in the BötC and preBötC areas (Flor et al., 2020, PMID: 32621515; Magalhaes et al., 2021; PMID: 34510468) and the effects of this inhibition across the respiratory cycle is debated. Future studies will be key to identify the source of pFL inhibition.

      The manuscript characterizes how bicuculline microinjections affect breathing parameters such as tidal volume, frequency, ventilation, inspiratory and expiratory time, as well as oxygen consumption. Those aspects of the manuscript are a bit tedious and sometimes overanalyzed. Plus, there was no predictive framework established at the outset for how one should expect disinhibition to affect breathing parameters. In other words, if the authors are seeking to map the pFL borders, then why analyze the breathing patterns so much? Does doing so provide more insight into the borders of pFL? I did not think it was compellingly argued.

      We have edited the introduction to address this comment and emphasize the rationale for the study. We also edited the results section to summarize our findings.

      We continue to report our in-depth analysis of the perturbations induced by bicuculline injection over the various respiratory characteristics as this will be fundamental to determine the effects of our experiment not only on the activation of pFL and active expiration, but also on the respiratory network in general. In order to be fair and open about our findings we have reported the results of our analysis in detail. Of note, all sites generated active expiration, but since the objective of the study was to determine the sites with the most significant changes, a finer and multilevel analysis has been used.

      Further, lines 382-386 make a point about decreasing inspiratory time even though the data do not meet the statistical threshold. In lines 386-395, the reporting appears to reach significance (line 388) but not reach significance (line 389). I had trouble making sense of that disparity.

      The statistics were confirmed, and the lines edited as follows: “Interestingly, the duration of inspiration during the response was found to decrease in all groups relative to baseline respiration (Ti response = 0.279 ± 0.034s, Ti baseline = 0.318 ± 0.043s, Wilcoxon rank sum: Z = 3.24, p = 0.001). Contrary to this decrease in inspiratory duration, the total expiratory time was observed to increase in all groups and remained elevated compared to baseline (TE response = 1.313 ± 0.188s, TE baseline = 1.029 ± 0.161s, Wilcoxon rank sum: Z = 4.49, p = 0.001).”

      The other statistical hiccups include "tended towards significance" (line 454), "were found to only reach significance for a short portion of the response" (line 486-7), "did not reach the level of significance" (line 506), which gives one the sense of cherry picking or over-analysis. Frankly, this reviewer finds the paper much more compelling when just asking whether the microinjections evoke active expiration. If yes, then the site is probably part of the pFL.

      Statistical “tendencies” have been eliminated throughout the manuscript.

      We have analyzed in details our results in order to determine changes and differential effects on respiration when comparing the 5 sites of injections. Although the presentation of the results may seem tedious, it has allowed us to highlight some interesting effects: first, the effects on respiratory frequency. It has been shown in the past that optogenetic stimulation of this area causes an increase in respiratory frequency (Pagliardini et al., 2011, PMID: 21414911), whereas a dishinibition with this same approach or stimulation of AMPAreceptor in pFL have shown a reduction in frequency or not a significant change in the response (Pagliardini et al., 2011, PMID: 21414911; Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151). Here, we suggest that the reduction in respiratory frequency is observed only in the caudal sites and could be attributed to BötC effects rather than the stimulation of the core of the pFL since no respiratory change was observe where the effect was more potent (rostral side). Another interesting point was the effects on O2 consumption, although difficult to interpret at this point, we found very interesting that hyperventilation occurred only at the most rostral injection sites.

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size.

      Thank you for the feedback on our description of the statistical results and the suggestion of incorporating effect size. We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in controlling active expiration. Stereotactic injections of bicuculline were utilized to map various pFL sites and their impact on respiration. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study indicates that the rostrocaudal organization of the pFL and its influence on breathing is not simple and uniform.

      Strengths:

      The data provide novel insights into the importance of rostral locations in controlling active expiration. The authors use innovative analytic methods to characterize the respiratory effects of bicuculline injections into various areas of the pFL.

      Weaknesses:

      Bicuculline injections increase the excitability of neurons. Aside from blocking GABA receptors, bicuculline also inhibits calcium-activated potassium currents and potentiates NMDA current, thus insights into the role of GABAergic inhibition are limited.

      Increasing the excitability of neurons provides little insights into the activity pattern and function of the activated neurons. Without recording from the activated neurons, it is impossible to know whether an effect on active expiration or any other respiratory phase is caused by bicuculline acting on rhythmogenic neurons or tonic neurons that modulate respiration. While this approach is inappropriate to study the functional extent of the conditional "oscillator" for active expiration, it provides valuable insights into this region's complex role in controlling breathing.

      We have included a reflection of the weaknesses of our studies in the technical consideration section to address the possibility that bicuculline may induce active expiration through other mechanisms. Please note that the use of bicuculline was not to gain further insight on GABAergic inhibition of pFL but to adopt a tool to generate active expiration that has been extensively validated by our group and others.

      Multiple studies have shown recruitment of excitatory late expiratory neurons with bicuculline injections. Although we did not record from late-E neurons in this study, we infer from the body of literature that disinhibition of neurons in this area will activate late-E neurons (as previously demonstrated) and generate active expiration. Although we see value in recording activity of single neurons (especially to study mechanisms of rhythmogenesis), we opted to measure the physiological response from respiratory muscles as an indication of active expiration recruitment in vivo. Recording from single neurons after bicuculline injections in each site would confirm the presence of expiratory neurons along the parafacial area, which is probably not surprising, since every site tested promoted active expiration. The focus of the study though was to determine the site with the strongest physiological response to disinhibition. Future studies will be key to determine whether all neurons along this column have similar electrophysiological rhythmic properties to the ones recently reported (Magalhaes et al., 2021; PMID: 34510468), or some of them simply provide tonic drive to late-E neurons located elsewhere.

      We have discussed the issue as follows:

      “Our experiments focused on determining the area in the pFL that is most effective in generating active expiration as measured by ABD EMG activity and expiratory flow. We did not attempt to record single cell neuronal activity at various locations as previously shown in other studies (Pagliardini et al 2011; Magalhaes et al., 2021), as this approach would most likely find some late-E neurons across the pFL and thus not effectively discriminate between areas of the pFL. Future studies involving multi-unit recordings or imaging of cell population activities will help to determine the firing pattern and population density of bicuculline-activated cells and further determine differences in distribution and function of late-E neurons across the region of the pFL.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall, the manuscript addresses an important question in the field, the anatomical location of the expiratory oscillator. I commend the authors for a well-thought-out and clearly presented study. However, a few small concerns deserve attention to improve the clarity of the report.

      (1) The figures would benefit from a rostral-to-caudal representation of results instead of a caudal-to-rostral orientation. Example, Figure 2.

      We opted for a caudal to rostral representation to progressively move away from the inspiratory oscillator (preBötC) and the anatomical reference point (the caudal tip of the facial nucleus) with our series of injections. 

      (2) A discussion about how expiratory responses generated by these pharmacological approaches would compare to endogenous baseline conditions. The authors mention that bicuculline injections elicited a late-E downward inflection that was absent in baseline conditions. Thus, this raises the point of how these findings compare to awake freely moving animals or during different conditions of increased ventilatory demand.

      This is an interesting question that has not yet been address in the field. As far as we know, there are no recordings of pFL neurons in freely behaving animals although recordings of pFL late-E neurons under elevated PaCO2 have shown a late-E activity in in situ preparations (Britto & Moraes, 2017; PMID: 28004411; Magalhaes et al., 2021; PMID: 34510468).

      We have clarified this in the discussion as follows:

      “At rest, respiratory activity does not present with active expiration (i.e, expiratory flow below its functional residual capacity in conjunction with expiratory-related ABD muscle recruitment) and expiratory flow occurs due to passive recoil of chest wall with no contribution of abdominal activity. Active expiration and abdominal recruitment can be spontaneously observed during sleep (in particular REM sleep, Andrews and Pagliardini, 2015; Pisanski et al., 2019) and can be triggered during increased respiratory drive (e.g. Hypercapnia, RTN stimulation, Abbott et al., 2011). Although never assessed in freely moving, unanesthetized rodents, bicuculline has been extensively used to generate active expiration and late-E neuron activity in both juvenile and adult anesthetized rats (Pagliardini et al., 2011; Huckstepp et al., 2015 Huckstepp et al., 2016; Huckstepp et al., 2018; De Britto and Moraes, 2017; Magalhaes et al., 2021). “

      (3) In Figure 2A, there appears to be an injection site in the top right quadrant of the image, very distant from the intended site. Could the authors confirm if this is an artifact?

      Yes, it is an artifact of image acquisition, we should have marked that in the figure. To avoid confusion and follow other reviewers’ suggestions we have edited he figure.

      (4) A stylistic suggestion would be to include the subpanel of Figure 2C saline control injection as a graph of its own and also include the control anatomical location in 2B.

      Thanks for the suggestion. Because of the complex organization of the figure we opted to leave it as a subpanel in order to not distract the reader from the 5 injection sites, but still provide information about vehicle injection and their lack of changes in respiratory response.

      (5) The authors note that DIAm Area (norm.) during the inspiratory phase is increased in the +6 and +8mm groups. However, Figure 5E shows that the +8mm group is significantly reduced as compared to the +6mm group. Please clarify.

      During the inspiratory phase we did not observe any significant change in the DIA Area (norm.). We realize that the description of this part of the results was confusing and therefore we have eliminated that section.

      Reviewer #2 (Recommendations For The Authors):

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size. There is a valuable editorial in this week's J Physiology (https://doi.org/10.1113/JP285575) that may provide helpful guidance.

      Thanks for this comments and the general assessment. We realized that the results section was dense and with a lot of information. We significantly slimmed the description of the results in order to facilitate the appreciation of the results and avoid confounding statement about significant vs non- significant results.

      We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      The equipment and resources should be clearly identified and use RRIDs whenever possible. Resources like antibodies and other reagents (e.g., cryoprotectants) should be identified, not just by manufacturer, but also by specific part or product numbers or identifiers.

      Manuscript has been edited to add these details.

      The manuscript makes reference to ImageJ and Matlab routines, which must be public through GitHub or another stable repository.

      Thanks for pointing this out. Image J analysis has been performed following scripts already available to users (no custom scripts). The Matlab scripts used for the multivariate analysis is now available at: https://github.com/mprosteb/Pisanski2024

      The way that ABD-DIA coupling was assessed was unclear from the Methods.

      The following text has been added to the methods: “The coupling between ABD and DIA signals was measured as a ratio and analyzed by quantifying the number of bursts of activity observed for the ABD and DIA EMG signals during the first 10 minutes of the response, excluding time bins at end of the response (due to fading and waning of the ABD response in those instances).”

      Fig. 1A was never cited in the text.

      It has been cited now.

      Fig. 1A-C appears to be exactly the same as Fig. 5A-C.

      The reviewer is correct. We have used figure 1 to describe and explain our analytical methods with sample data and Figure 5 describes our results. We have clarified that in: “Figure 5: Rostral injections elicit more prominent changes to respiration in each signal and sub-period. A-C: Is the same as Method Figure 1, has been included here for further clarity when analyzing the results.”

      Late Expiratory airflow is given in units of volts (V) in lines 358-363 (Fig. 4C) but then in units of volts-seconds (V•s) in lines 363-367. Both units are problematic because the voltage is neither an air volume nor an air volume per unit time. Is there some conversion factor left out?

      In this section of the results we describe the changes in expiratory peak amplitude (V) and expiratory peak flow (V•s). Since calibration of airflow was performed on the positive flow and for larger volumes, we prefer to use the original units to guarantee precise assessment of the change and avoid introducing potential errors. Since the analysis considers changes from baseline readings, converting to ml or ml*s would not affect our analysis.

      Reviewer #3 (Recommendations For The Authors):

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in respiratory control, specifically in modulating active expiration. The precise location of this expiratory oscillator within the ventral medulla remains uncertain, with some studies indicating that the caudal tip of the facial nucleus (VIIc) forms the core while others propose more rostral areas. Bicuculline injections were utilized at various pFL sites to explore the impact of these injections on respiration. The authors use innovative and impressive analytic methods to characterize the effect on respiratory activity. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study will contribute to an enhanced understanding of the neural mechanisms controlling active expiration. The main message of the study is that the rostro-caudal organization of the pFL is not simple and uniform. The data provides novel insights into the importance of rostral locations in controlling active expiration (see e.g. lines 738-740).

      The data and results of the paper are intriguing, and it appears that the experiments are well-managed and executed. However, there are several major and minor comments and suggestions that should be addressed by the authors:

      (1) The study relies heavily on local injections into specific areas that are confirmed histologically. One potential concern is the injection volume of 200 nL in such a tiny area. The authors suggest that the drug did not spread to rostral/caudal areas outside the specified coordinate partly based on their cFOS staining. For example, the lack of cFOS activation in TH+ cells and Phox2B cells is interpreted as proof that bicuculline did not spread to these somas (Figure 2). The authors seem to use a similar argument as evidence that the pFL does not include Phox2B neurons in the RTN as discussed in the Discussion section (lines 830-847). However, it is very surprising that bicuculline injections into an area that is known to contain Phox2B and Th+ neurons do not activate these neurons as assessed by the cFOS staining. It seems puzzling to me that none of their injections shown in Figure 2 activated Phox2B or Th neurons. I assume that in targeting the pFL the authors must have sometimes hit areas that included neurons that define the RTN, which would have activated Phox2B or Th+ neurons. Did the authors find that these activations did not activate active expiration? Such negative "controls" would strengthen their argument that pFL is a separate and distinct region that selectively controls active expiration.

      Thanks for the positive feedback on the manuscript. As it has been demonstrated and discussed in several previous publications, PHOX2B expressing neurons in this area of the brain are part of the RTN Neuromedin B positive neurons (more densely located in the ventral paraFacial rather than the lateral parafacial, our site of injection), the TH+ C1 neurons (located in a somewhat more caudal and medial position compared to our sites of injection, around the BötC/ preBötC area) and the large Facial MN (easily identifiable by their large size and compact location). Given this differential spatial distribution, and the controls described below, we believe we have reduced the possibility of the direct activation of these neurons, although we can’t exclude it in full.

      There is now strong evidence about lack of PHOX2B expression in late E neuron in juvenile and adult rats (Magalhaes et al., 2021; PMID: 34510468). We realize that the microinjected solution could potentially diffuse in the brain and hit other areas, but we combined two strategies to verify our intention for a focal injection activating only a restricted area of the brain (i.e., the pFL): i) localization of fluorobeads that were diluted in the Bicuculline solution; ii) expression of cFos combined with anatomical markers, to identify activated cells. Fluorobeads have a very limited spread in the brain and therefore informed us of the site of the injection to differentiate between the five injections locations. Although we can’t assume that Bicuculline will have a similar spread (and it will also be quickly degraded in the tissue), the combination of this analysis with the localized expression of cFos cells has helped us to differentiate between injections site. Because of the proximity of PHOX2B cells in RTN and C1 neurons, we also combined cFos expression with immunohistochemistry to determine whether bicuculline activation was also visible in these two neuronal populations. Our results indicate that there is baseline cfos activity in RTN neurons (see vehicle injection) but the fraction of PHOX2B activated cells did not increase with bicuculline injections suggesting that these neurons were not the target of our injections. Please note that cfos expression has been extensively used to determine RTN neuron activation, especially following chemoreflex responses. 

      (2) The authors refer to "the expiratory oscillator" throughout the manuscript (e.g. lines 58, 62, 65) as if there is only one expiratory oscillator i.e. "the expiratory oscillator". For some reason, the authors avoided citing and mentioning PiCo (Anderson et al. 2016), which is considered the oscillator for postinspiration. Since the present study focuses on the role of expiration, and since the authors describe convincing effects on postinspiration, considering this oscillator which is located dorsomedial to the VRC seems relevant for the present study.

      Due to the limited and controversial literature that is currently present describing Pico as a third oscillator and the fact that our studies do not directly assess the post-inspiratory activity (as measure by the V nerve or laryngeal muscles) or Pico activity and location (which would be even more distant than the RTN, for example), we prefer to avoid commenting on the effects of this injection on Pico or the connectivity between Pico and pFL.

      We have added this to the discussion:

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (3) The authors do not specify what type of bicuculline they injected. Bicuculline is known to have significant effects on potassium channels. Thus, the effects reported here could be due to a non-specific change in excitability, rather than caused by a specific GABAergic blockade.

      The authors also do not know what effects these injections cause in the neurons in vivo, since the injections are not accompanied by recordings from the respiratory neurons that they activate. This together with the non-specific bicuculline effects will affect the interpretation of the results. Thus, the authors need to be more careful when interpreting their effects as "GABAergic". The use of more specific blockers like gabazine could partly address this concern. The authors have to discuss this in a "limitation section".

      Thanks for pointing that out, we have now clarified in the methods section that we used bicuculline methochloride. We can’t exclude that some side- effects could be present due to the use of this drug. For the purpose of this study though, we focused on using bicuculline as a tool to consistently generate active expiration since it has been extensively used by multiple laboratories to induce abdominal muscle recruitment and active expiration, as well as to directly record late-E neurons in this same area.

      We have included in the discussion the following statement:

      “Technical considerations

      Bicuculline methiodide has previously been observed to exhibit inhibitory effects on Ca2+ activated K+ currents inducing non-specific potentiation of NMDA currents (Johnson and Seutin, 1997). Consequently, caution is warranted in attributing our findings solely to the GABAa antagonist properties of bicuculline. Previous work has demonstrated a temporal correlation between the onset of late-E neuron activity in the caudal parafacial region and ABD activity in response to bicuculline (Pagliardini et al., 2011; de Britto and Moraes, 2017; Magalhaes et al., 2021) as well as GABAergic sIPSCs in late-E neurons (Magalhaes et al., 2012). However, it is essential to note that the current study lacks single unit recording, preventing us from definitively confirming whether the observed activity stems from late-E neuronal GABAergic dishinibition or excitation through non GABAergic mechanisms.”

      (4) I also caution the authors when stating that the bicuculline injections will reveal the precise location and functional boundaries of "the" expiratory oscillation within the pFL. Increasing the excitability with bicuculline is inappropriate to study the functional boundaries of an oscillator. It is particularly inappropriate to identify the boundaries of the pFL, a network that is normally inactive and activated only under certain behavioral and metabolic conditions. Because the injections are increasing the neuronal excitability unspecifically, and because the authors are not recording the activity of the neurons in the pFL region it is unclear what kind of neurons are activated. The cFOS staining may help to define whether these neurons are Phox2B or Th positive or negative, but they will not provide insights into the activity patterns of the activated neurons. Thus, it is fair to assume that these injections will likely include also tonic neurons that might indirectly control the activity of pFL neurons under certain metabolic or behavioral conditions without actually being involved in the rhythmogenesis of active expiration. Many of the effects peak after several minutes, and different regions cause differential effects with different time courses, which is difficult to interpret functionally. Thus, the "core" identified in the present study could consist of tonic neurons as opposed to rhythmic neurons generating active expiration.

      We agree with the reviewer that our local injections may have activated an heterogeneous population of neurons. We do not claim that we only activated late-E rhythmogenic neurons but that our multiple sites of injections revealed the area that is generating the strongest excitation of ABD muscles and active expiration.

      While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype

      As suggested by the reviewer, it is possible that the bicuculline injection may have activated some tonic non rhythmogenic neurons which could activate the expiratory oscillator located elsewhere.

      We have edited the introduction as follows:

      “By strategically administering localized volumes of bicuculline at multiple rostrocaudal levels of the ventral brainstem, we aimed to selectively enhance the excitability of neurons driving active expiration, thereby revealing the extension of the pharmacological response and the most efficient site in generating active expiration.”

      We have edited the results as follows:

      “Importantly, the group with injection sites at +0.6 mm from VIIc exhibited the swiftest response onset, suggesting that this area is the most critical for the generation of active expiration, either through direct activation of the expiratory oscillator or, alternatively, for providing a strong tonic drive to late-E neurons located elsewhere.”

      In the introduction, it should also be emphasized that the pharmacological approach used in the present study complements the existing elegant chemogenetic studies, rather than emphasizing primarily the limitations of the chemogenetic inhibitions. The conclusion should be that these studies together provide different, yet complementary insights: The chemogenetic approach by inhibiting neurons, the present study by exciting neurons, and all studies come with their own limitations.

      Thanks for the suggestion, we have updated the manuscript as follows:

      “Although both of these elegant chemogenetic studies have contributed extensively to our understanding of the pFL, the existing evidence suggests that the expiratory oscillator may expand beyond the limits of the viral expression achieved in said studies, as proposed by Huckstepp et al., (2015).”

      Throughout the manuscript, the authors have to be cautious when implying that an excitatory effect relates to the activity of rhythmogenic pFL neurons. For example, on line 710 the authors state that "it is conceivable to infer that the rostral pFL is in the closest proximity to the cells responsible for the generation of active expiration". While it may indeed be "conceivable", the bicuculline injections themselves provide no insights into the location of neurons responsible for rhythmogenesis. It is equally "conceivable" that the excited neurons provide a tonic drive to the neurons without being involved in the generation of active expiration. These tonic neurons could be located at a distance from the presumed rhythmogenic core.

      We have included the possibility of tonic excitation in the technical considerations section:

      “However, our study did not include recording from late-E neurons following bicuculline injections, preventing us from definitively confirming whether the observed activity stems from late-E neuronal excitation or the potentiation of a tonic drive, particularly in the rostral areas.”

      (5) It is intriguing that some of their injections (Fig.2D) evoked postinspiratory activity. This interesting finding should be discussed as it could provide important insights into the coordination of the different phases of expiration.

      Thanks for the suggestion. We have included the following to the discussion:

      “Therefore, although it has previously been described, the exact mechanism by which this post-I ABD activity is generated is unclear. This late-E/post-I pattern of activity is similar to what has been observed in in vitro preparations and in vivo recordings in juvenile rats (Janczewski et al., 2002; Janczewski et al., 2006).

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (6) The authors conducted bilateral disinhibition of the pFL, but only a unilateral photomicrograph was shown. Figure 2 should include a representative bilateral photomicrograph along with a scatter plot for clarity and completeness.

      We have edited figure 2 to include representative images of bilateral injections.

      (7) Regarding the Bicuculline injections in the Methods section: Aside from specifying exactly what type of bicuculline was used, the authors should provide more information about the pFL location and landmarks used, including the missing medial-lateral coordinate. The fluorobead spread of approximately ~300 µm, as observed in Figure 2C, is crucial for the interpretation of the results and should be detailed. An alternative approach could involve e.g. calculating the area covered by fluorobeads in each group.

      We have included the following in the text:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5)”

      “These findings strongly suggest that bicuculline specifically activated cells within the vicinity of the injection sites which spread ~300 ìm (Figure 2C, horizontal lines) and did not activate PHOX2B+ cells in the RTN area, beyond their baseline level of activity.”

      (8) In the Experimental Protocol, the authors should provide more details on how the parameters were determined. For example, specify the number of cycles included for Dia frequency/amplitude, Abd frequency/amplitude, and with regards to the averaging process, the authors should specify over how many cycles they obtained an average for Dia/Abd activity time and AUC. The authors should also provide information on the number of bicuculline injections that they repeated to average these values and they should report the coefficient of variation for repeated injections. Please clarify the method used to calculate AUC, considering the non-linear nature of the activity.

      Only one bicuculline injection per rat was performed and the number of rats used for each injection site is indicated in the methods as follows:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5), and CTRL (n=7). We recorded the physiological responses to the injection for 20-25 min.”

      We have clarified in the methods section the following:

      “Respiratory data was tracked in time bins of 2-minute duration from the baseline period prior to injections and spanned 20 min of recording post-injection. Mean-cycle measurements for each signal were computed by averaging values across all cycles within a given time bin.”

      Additional clarifications have been added:

      “We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “Mean-cycle measurements for each signal were computed by averaging across all cycles within a given time bin. (~300 cycles in baseline, ~100 cycles per response time bin). We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “The Area under the curve (AUC) was measured during baseline and was subtracted from the corresponding AUC of the response for each time bin (Figure 1C). This AUC measure was computed as the sum of the signal in a given respiratory phase as all signals were sampled at the same rate. Note that areas calculated below the zero- (0) line, as would be expected from a negative airflow during expiration, yields negative AUC values.”

      (9) The authors should explain how oxygen consumption was calculated-did it involve the Depocas & Hart (1957) formula? Please provide information on expiratory CO2, whether ventilation was adjusted to achieve consistent CO2 levels across animals, and ideally specify the end-tidal CO2 range for the experiments. Discuss the rationale behind the chosen CO2 levels and whether CO2-dependent pFL activity could have influenced results.

      We have clarified in the measurement in the methods as follows:

      “The gas analyzer measured fractional concentration of O2. Based on this and the flow rate at the level of the trachea (minute ventilation), we calculated O2 consumption according to Depocas and Hart (1957).”

      We have also added to the methods section:

      “During the entire experimental procedure, rats breathed spontaneously and end tidal CO2 was not adjusted through the experimental protocol.”

      In terms of the CO2-dependent pFL activity possibly influencing the results: by inducing active expiration in conditions in which there is no physiological demand for it (i.e. no hypoxia or hypercapnia), it is likely that pCO2 is reduced, overall decreasing the drive for ABD activity which would suggest that our results are likely an underestimation of the response that would have been produced if we maintained the CO2 levels constant.

      (10) The authors should address the discrepancy in fos-activated neurons between the control (44 neurons) and experimental animals (90-120 neurons per hemisection). Please explain the activation in the control group. Please also provide insights into how the authors interpret this difference in cfos-activated neurons between control and experimental groups.

      The following paragraph has been added to the discussion:

      “The assessment of cellular activity, quantified through cFos staining, unveiled the existence of basal activity in control rats. This observed baseline activity is likely emanating from subthreshold physiological processes within the parafacial area which do not culminate in ABD activity. Analysis of the cFos staining confirmed focal activation of neurons in the pFL of rats injected with bicuculline and minimal cFos expression in the PHOX2B+ cells in all groups as compared to the control group. These results confirm the very limited mediolateral spread of the drug from the core site of injection and back previous findings supporting the hypothesis that the majority of PHOX2B+ cells are more ventrally located in the parafacial area (pFV, Huckstepp et al., 2015) and PHOX2B+ cell recruitment is not necessary for active expiration (de Britto & Moraes, 2017; Magalhães et al., 2021).”

      (11) In Figure 8, the authors plotted the relationship of each cycle correlated to the normalized area. Have you also calculated the same late-E, inspiratory, and post-I to fR or VT separately?

      No, we only did the separated breathing phase (late-E, I, Post-I) analysis in the calculations of the DIA, airflow and ABD area, as well as on the Euclidean and Mahalanobis distances.

      Minor comments:

      Is there any specific reason for conducting these experiments exclusively in males?

      No, we usually use male rats for this type of experiments. We use both male and female rats for other studies that concern the effects of sex hormones but in this case, we performed experiments only in male rats.

      Page 13, Line 320: What is the duration of the bicuculline-induced effects?

      This information is included in the results section as follows:

      “Similarly, the ABD response duration was longer at the two most rostral locations (+0.6 mm = 17.6 ± 2.7 min; +0.8 = 17.1 ± 3.3 min) compared to the most caudal group (-0.2 mm = 2.4 ± 1.1 min; One-Way ANOVA p = 0.043; Tukey -0.2 mm vs +0.6 mm: p = 0.048; -0.2 mm vs +0.8 mm: p = 0.041; Figure 3E).”

      Page 16, Line 400: Is there a rationale for the high tidal volume (VT) observed in these animals? A baseline VT of 7 ml/kg appears notably elevated.

      Please note that rats were vagotomised and spontaneously breathing, hence the tidal volume is increased compared to non-vagotomised rats as seen in previous studies (Ouahchi et al., 2011).

      Figure 2D: Could you provide longer recordings? Additionally, incorporating diaphragm (Dia) recordings would enhance the interpretation of abdominal (Abd) recordings.

      Figure 3 A has a representative example of the 20 minute recordings for each location.

      Page 18, Line 458: Please rectify "Dunn: p , 0.001" to the appropriate format, perhaps "Dunn: p < 0.001."

      Thank you, edited.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This fundamental study investigates the transcriptional changes in neurons that underlie loss of learning and memory with age in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. The presented evidence is compelling, utilizing a cutting-edge method to isolate neurons from worms for genomics that is clearly conveyed with a rigorous experimental approach. Overall, this study supports that older daf-2 worms maintain cognitive function via mechanisms that are unique from younger wild type worms, which will be of great interest to neuroscientists and researchers studying ageing.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors perform RNA-seq on FACS isolated neurons from adult worms at days 1 and 8 of adulthood to profile the gene expression changes that occur with cognitive decline. Supporting data are included indicating that by day 7 of adulthood, learning and memory are reduced, indicating that this timepoint or after represents cognitively aged worms. Neuronal identity genes are reduced in expression within the cognitively aged worms, whereas genes involved in proteostasis, transcription/chromatin, and the stress response are elevated. A number of specific examples are provided, representing markers of specific neuronal subtypes, and correlating expression changes to the erosion of particular functions (e.g. motor neurons, chemosensory neurons, aversive learning neurons, etc).

      To investigate whether upregulation of genes in neurons with age is compensatory or deleterious, the authors reduced expression of a set of three significantly upregulated genes and performed behavioral assays in young adults. In each case, reduction of expression improved memory, consistent with a model in which age-associated increases impair neuronal function.

      The authors then characterize learning and memory in wild type, daf-2, and daf-2/daf-16 worms with age and find that daf-2 worms have an extended ability to learn for approximately 10 days longer that wild types. This was daf-16 dependent. Memory was extended in daf-2 as well, and strikingly, daf-2;daf-16 had no short term memory even at day 1. Transcriptomic analysis of FACS-sorted neurons was performed on the three groups at day 8. The authors focus their analysis on daf-2 vs. daf-2;daf-16 and present evidence that daf-2 neurons express a stress-resistance gene program. They also find small differences between the N2 and daf-2;daf-16 neurons, which correlate with the observed behavioral differences, though these differences are modest.

      The authors tested eight candidate genes that were more highly expressed in daf-2 neurons vs. daf-2;daf-16 and showed that reduction of 2 and 5 of these genes impaired learning and memory, respectively, in daf-2 worms. This finding implicates specific neuronal transcriptional targets of IIS in maintaining cognitive ability in daf-2 with age, which, importantly, are distinct from those in young wild type worms.

      Overall, this is a strong study with rigorously performed experiments. The authors achieved their aim of identifying transcriptional changes in neurons that underlie loss of learning and memory in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. 

      We thank you for the evaluation and response.

      Reviewer #2 (Public Review):

      Weng et al. perform a comprehensive study of gene expression changes in young and old animals, in wild-type and daf-2 insulin receptor mutants, in the whole animal and specifically in the nervous system. Using this data, they identify gene families that are correlated with neuronal ageing, as well as a distinct set of genes that are upregulated in neurons of aged daf-2 mutants. This is particularly interesting as daf-2 mutants show both extended lifespan and healthier neurons in aged animals, reflected by better learning/memory in older animals compared with wild-type controls. Indeed, knockdown of several of these upregulated genes resulted in poorer learning and memory. In addition, the authors showed that several genes upregulated during ageing in wild-type neurons also contribute to learning and memory; specifically, knockdown of these genes in young animals resulted in improved memory. This indicates that (at least in this small number of cases), genes that show increased transcript levels with age in the nervous system somehow suppress memory, potentially by having damaging effects on neuronal health.

      Finally, from a resource perspective, the neuronal transcriptome provided here will be very useful for C. elegans researchers as it adds to other existing datasets by providing the transcriptome of older animals (animals at day 8 of adulthood) and demonstrating the benefits of performing tissue-specific RNAseq instead of whole-animal sequencing.

      The work presented here is of high quality and the authors present convincing evidence supporting their conclusions. I only have a few comments/suggestions:

      (1) Do the genes identified to decrease learning/memory capacity in daf-2 animals (Figure 4d/e) also impact neuronal health? daf-2 mutant worms show delayed onset of age-related changes to neuron structure (Tank et al., 2011, J Neurosci). Does knockdown of the genes shown to affect learning also affect neuron structure during ageing, potentially one mechanism through which they modulate learning/memory? 

      (2) The learning and memory assay data presented in this study uses the butanone olfactory learning paradigm, which is well established by the same group. Have the authors tried other learning assays when testing for learning/memory changes after knockdown of candidate genes? Depending on the expression pattern of these genes, they may have more or less of an effect on olfactory learning versus for e.g. gustatory or mechanosensory-based learning.

      (3) A comment on the 'compensatory vs dysregulatory' model as stated by the authors on page 7 - I understand that this model presents the two main options, but perhaps this is slightly too simplistic: gene expression that rises during ageing may be detrimental for memory (= dysregulatory), but at the same time may also be beneficial other physiological roles in other tissues (=compensatory). 

      Thank you for your original suggestions; we addressed them in the previous version of response to the reviewers.

      Comments on revised version:

      I am satisfied with how the authors have addressed all my comments/suggestions. 

      Thank you for your response!

      Reviewer #3 (Public Review):

      Summary

      In this manuscript, Weng et al. identify the neuron specific transcriptome that impacts age dependent cognitive decline. The authors design a pipeline to profile neurons from wild type and long-lived insulin receptor/IGF-1 mutants using timepoints when memory functions are declining. They discover signatures unique to neurons which validates their approach. The authors identify that genes related to neuronal identity are lost with age in wild type worms. For example, old neurons reduce the expression of genes linked to synaptic function and neuropeptide signaling and increase the expression of chromatin regulators, insulin peptides and glycoproteins. Depletion of selected genes which are upregulated in old neurons (utx-1, ins-19 and nmgp-1) leads to improved short memory function. This indicates that some genes that increase with age have detrimental effects on learning and memory. The pipeline is then used to test neuronal profiles of long-lived insulin/IGF-1 daf-2 mutants. Genes related to stress response pathways are upregulated in long lived daf-2 mutants (e.g. dod-24, F08H9.4) and those genes are required for improved neuron function.

      Strengths

      The manuscript is well written, and the experiments are well described. The authors take great care to explain their reasoning for performing experiments in a specific way and guide the reader through the interpretation of the results, which makes this manuscript an enjoyable and interesting read. The authors discover novel regulators of learning and memory using neuron-specific transcriptomic analysis in aged animals, which underlines the importance of cell specific deep sequencing. The timepoints of the transcriptomic profiling are elegantly chosen, as they coincide with the loss of memory and can be used to specifically reveal gene expression profiles related to neuron function. The authors discuss on the dod-24 example how powerful this approach is. In daf-2 mutants whole-body dod-24 expression differs from neuron specific profiles, which underlines the importance of precise cell specific approaches. This dataset will provide a very useful resource for the C. elegans and aging community as it complements existing datasets with additional time points and neuron specific deep profiling.

      Weakness

      This study nicely describes the neuron specific profiles of aged long-lived daf-2 mutants. Selected neuronal genes that were upregulated in daf-2 mutants (e.g. F08H9.4, mtl-1, dod-24, alh-2, C44B7.5) decreased learning/memory when knocked down. However, the knock down of these genes was not specific to neurons. The authors use a neuron-sensitive RNAi strain to address this concern and acknowledge this caveat in the text. While it is likely that selected candidates act only in neurons it is possible that other tissues participate as well.

      Thank you for pointing this caveat out. We have mentioned it in the figure legend.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      To begin with, we will better explain the goal of the study in the introduction and explain that it relies on earlier theoretical work. The goal of the study was to investigate whether and how detailed neuron models with biologically-based morphologies, membrane properties, ion channels, dendritic nonlinearities, and biologically plausible learning rules can quantitatively account for the theoretical results obtained with more abstract models.

      We will further evaluate and clarify the roles of several components in our model regarding their impact on the results. These include a) the role of sufficiently robust and supralinear plateau potentials in computing the NFBP; and b) the importance of metaplasticity for individual synapses, allowing them to start or stop responding to relevant or irrelevant stimuli, respectively, over the training period.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      To achieve the goal of the study as described above, we chose to use a biophysically and morphologically detailed neuron model to see if it could quantitatively account for the theoretically-based nonlinear computations, for instance, those discussed in Tran-Van-Minh, A. et al. (2015).

      We will explain the role of each component of the learning rule, as well as the dendritic nonlinearities, for the performance on the NFBP.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Indeed, the learning rule is local and reward-based, and we will highlight better in the paper that it is “always on”, i.e. there are no separate training and testing phases.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      We will revise the manuscript thoroughly to better present the figures and writing (more detailed below). We will also show supplementary figures showcasing the role of the different components of the learning rule.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The detailed setup of the model is described in the referenced papers, including equations and parameter values. The model is downloadable on github. For this reason we did not repeat the information here. That said, we will go through the manuscript and clarify all details, and provide supplemental figures and a GitHub link where necessary for reproducing the results.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      We will revise the figures for consistency and clarity.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The reviewer is correct; the cited text does not present experimental facts but rather illustrates how the learning rule operates. We will revise the section on the construction of learning rules to clarify which aspects are explicit assumptions and which are experimentally verified. In particular, we will provide a more detailed description and motivation for metaplasticity

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      Briefly, the goal of the study was to investigate whether earlier theoretical results with more abstract models can be quantitatively recapitulated in morphologically and biophysically detailed neuron models with dendritic nonlinearities and with biologically based learning rules. (similar response to Summary and Weaknesses to Reviewer #1). We will update the introduction with this information.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      As mentioned above, our intent is not to demonstrate the computational advantages of the proposed learning rule but to investigate and illustrate how biophysically detailed neuron models that also display dendritic plateau potential mechanisms, together with biologically-based learning rules, can support the theoretically predicted computational requirements for complex neuronal processing (e.g., Tran-Van-Minh, A. et al., 2015), as well as the results obtained with more abstract neuron models and plateau potential mechanisms (e.g., Schiess et al., 2016; Legenstein and Maass, 2011).

      In the revised manuscript, we will also discuss the differences between the supervised learning rule in Bicknell and Hausser (2021) and our local and reward-based learning rule. We will also show a critical evaluation of how our local learning rule and metaplasticity affect the synaptic weights and why the different components of the rule are needed.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      We will make an attempt to better link the learning rule and dendritic supra-linearities and interpret their biological function.

    1. Author response:

      eLife assessment

      “…The evidence however is incomplete, since the tai loss-of-clone phenotype is based on one allele and the mechanism involved in cell competition through Dlp and Wg lacks adequate supporting data.”

      We agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      We also agree that additional functional data would help demonstrate that differences in Dlp levels are required for the mechanism of Tai cell competition. Experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      Reviewer #1:

      Overall Statements:

      “There is some data in the supplementary materials suggesting that Tai promotes dlp mRNA expression, but this was not compelling.”

      We are currently testing effects on Tai on dlp and dally transcription using qPCR and reporter transgenes. As noted below, the effects of Tai on Dlp trafficking are ‘strong’, so resolving effects on Dlp transcription will complement this localization data.

      “The authors don't further examine Dlp protein in tai clones.”

      As noted by the Reviewer, we do examine Dlp levels and localization in tai-low clones (see Figure 9), but these experiments are challenging due to their very small size and the hypomorphic nature of the tai allele (tai[k15101]) that was used. Experiments are in progress to examine the effect of our Crispr null allele of tai on Dlp levels and localization in wing clones.

      “In sum, the authors have uncovered some interesting results, but the story has some unresolved issues that, if addressed, could boost its impact. Additionally, the preprint seems to have 2 stories, one about tai and cell competition and the other about tai and Wg distribution. It would be helpful to reorder the figures and improve the narrative so that these are better integrated with each other.”

      We agree. The results of our modifier screen required that we first understand how Tai regulates the Wg pathway before could apply this to understanding the competitive mechanism. Thus, the paper is composed of three sections: 1. the screen, 2. the Tai-Dlp-Wg connection in the absence of competition, and 3. the contribution of Dlp-Wg to the tai[low] ‘loser’ phenotype. These sections use different techniques (e.g., clonal mosaics with genomic alleles, Gal4/UAS and RNAi to define the effect of Tai loss on Wg and Dlp). Ongoing experiments return to clonal mosaics to test whether elevating Dlp can rescue tai lof clones in the same manner as Apc/Apc2 alleles (see Figs. 2-3), which elevate Wg pathway activity.

      Specifics:

      “It would be good to know whether the authors can rescue tai-low clones by over-expression UAS-Dlp.”

      As noted above, experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      “The data on Wg distribution seems disjointed from the data about cell competition. The authors could refocus the paper to emphasize the cell competition story. The role of Dlp in Wg distribution is well established, so the authors could remove or condense these results. The story really could be Figs 1, 2, 3 and 7 and keep the paper focused on cell competition. The authors could then discuss Dlp as needed for Wg signaling transduction, which is already established in the literature.”

      We appreciate the suggestion to reorganize the figures to focus the first part of the story on competition, and then follow with the role of Tai in controlling Dlp. We will consider this approach pending the results of ongoing experiments.  

      “The model of tai controlling dlp mRNA and Dlp protein distribution is confusing. In fact, the data for the former is weak, while the data for the latter is strong. I suggest that the authors focus on the altered Dlp protein distribution on tai-low clones. It would also be helpful to prove the Wg signaling is impeded in tai clones (see #5 below).”

      We agree but are currently testing how dlp reporters and mRNA respond to Tai in order to rigorously test a Dlp transcriptional mechanism. To complement the ‘strong’ evidence that Tai regulates Dlp distribution, we are testing Dlp in clones of our Tai Crispr null. Since submission, we have also assessed the effect of blocking the endocytic factor shibire/dynamin in Dlp distribution in Tai deficient cells to complement the data on Pentagone that is already in the paper (see Fig. S3).

      “I don't know if the Fz3-RFP reported for Wg signaling works in imaginal discs, but if it does then the authors could make clones in this background to prove that cell-autonomous Wg signaling is reduced in tai-low clones.”

      We thank the reviewer for this suggestion, which we are now testing.

      Reviewer #2

      Overall Comments:

      “While the authors present good evidence in support of most of their conclusions, there are alternative explanations in many cases that have not been excluded.”

      We appreciate this point and are conducting experiments for a revised submission that will help test alternative mechanisms and clarify our conclusions.

      Specifics:

      “However, the experiments have been done with a single allele, and these experiments do not exclude the possibility that there is another mutation on the same chromosome arm that is responsible for the observed phenotype. Since the authors have a UAS-tai stock, they could strengthen their results using a MARCM experiment where they could test whether the expression of UAS-tai rescues the elimination of tai mutant clones. Alternatively, they could use a second (independent) allele to demonstrate that the phenotype can be attributed to a reduction in tai activity.”

      As noted above, we agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      The tai[k15101] allele acts as a tai hypomorph and has been shown to produce weaker phenotypes than the 61G1 strong lof in a number of papers (Bai et al, 2000; König et al, 2011, Luo et al, 2019, and Zhang et al, 2015). We agree that rescue of tai[k1501] with a UAS-Tai transgene would help rule out effects of second site mutations. We are currently pursuing the reviewer’s second suggestion of phenocopy with a different allele, our new tai Crispr lof.   

      “The authors have screened a total of 21 chromosomes for modification and have not really explained which alleles are nulls and which are hypomorphs. The nature of each of the alleles screened needs to be explained better.”

      We will update the text to better reflect what type of alleles were chosen. In most cases we preferred amorphs or null alleles over hypomorphs, however when the amorph option was not available, we used hypomorphs.

      “Also, the absence of a dominant modification does not necessarily exclude a function of that gene or pathway in the process. This is especially relevant for the Spz/Toll pathway which the authors have previously implicated in the ability of tai-overexpressing cells to kill wild-type cells.”

      We thank the reviewer for this completely accurate point. The dominant screen does not rule out effects of other pathways such as Spz/Toll. Indeed, we were surprised by the lack of dominant effects by Spz/Toll alleles on tai[low] competition given our prior work. The reciprocally clear dominant effect of Apc/Apc2 led us to consider that Wg signaling plays a role in this phenomenon, which then became the starting point of this study.

      “The most important discovery from this screen is the modification by the Apc alleles. This part of the paper would be strengthened by testing for modification by other components of the Wingless pathway. The authors show modification by Apc[MI01007] and the double mutant Apc[Q8] Apc2[N175A]. Without showing the Apc[Q8] and Apc2[N175A] alleles separately, it is hard to know if the effect of the double mutant is due to Apc, Apc2,` or the combination.”

      We agree that testing for modification with other components of the Wg pathway would be helpful to strengthen the connection between Tai low clonal elimination and Wg pathway biology. We also agree that separating Apc [Q8] and Apc2 [N175A] would be a good idea to check if both Apc proteins are equally important for rescuing Tai low cell death, and future experiments for the lab could investigate this distinction.

      “RNAi of tai seems to block the formation of the Wg gradient. If so, one might expect a reduction in wing size. Indeed, this could explain why the wings of tai/Df flies are smaller. The authors mention briefly that the posterior compartment size is reduced when tai-RNAi is expressed in that compartment. However, this observation merits more emphasis since it could explain why tai/Df flies are smaller (Are their wings smaller?).”

      We agree that this is an exciting possibility. Growth effects of Tai linked to interactions with Yorkie and EcR could be due to a distinct role in promoting Wg activity. Alternatively, Tai may cooperate with Yorkie or EcR to control Wg pathway. These are exciting possibilities that we are pursuing in future work

      With regard to the “small size” effect of reducing Tai, we have previously shown that RNAi of Tai using engrailed-Gal4 causes the posterior compartment to shrink (Zhang et al. 2015, Figure 1C-F, H). In this paper, we also showed that tai[k15101]/Df animals are proportionally smaller than wildtype animals and quantified this by measuring 2D wing size (Zhang et al. 2015, Figure 1A and 1B)

      “In Figure 7, the authors show the effect of manipulating Tai levels alone or in combination with increasing Dlp levels. However, they do not include images of Wg protein distribution upon increasing Dlp levels alone.”

      We thank the reviewer for this reminder and have already generated these control images to include in a revised submission paper.

      “In Figure 8, there is more Wg protein both at the DV boundary and spreading when tai is overexpressed in the source cells using bbg-Gal4. However, in an earlier experiment (Figure 5C) they show that the wg-lacZ reporter is downregulated at the DV boundary when tai is overexpressed using en-Gal4. They therefore conclude that wg is not transcriptionally upregulated but is, instead secreted at higher levels when tai is expressed in the source cells. Wg protein is reduced in the DV stripe with tai is overexpressed using the en-Gal4 driver (Figure 6B') and is increased at the same location when tai is overexpressed with the bbg-Gal4 driver. (Figure 8) I don't know how to reconcile these observations.”

      We thank the reviewer for pressing us to develop an overall model explaining our results and how we envision Tai regulating Dlp and Wg. We are preparing a graphic abstract that illustrates this model and will be included in our revision.

      Briefly, we favor a model in which Tai controls the rate of Wg spread via Dlp, without a significant effect on wg transcription. For example, the induction of Dlp across the ‘engrailed’ domain of en>Tai discs (Fig 7B-B”) allows Wg to spread rapidly across the flanks and moderately depletes it from the DV margin (Fig 6B-B”) as noted by the reviewer. Adding a UAS-Dlp transgene in the en>Tai background dramatically accelerates Wg spread and causes it to be depleted from the DV margin and build up at the far end of the gradient adjacent to the dorsal and ventral hinge. Significantly blocking endocytosis of Wg in en>Tai discs with a dominant negative shibire transgene also causes Wg to build up in the same location (new data to be added in a revision) consistent with enhanced spreading. The difference in the bbg-Gal4 experiment is that Tai is only overexpressed in DV margin cells, which constrains and concentrates Wg within this restricted domain; we are in the process of testing whether this effect on Wg is blocked by RNAi of Dlp in bbg>Tai discs.

      “In Figure 9, the tai-low clones have elevated levels of Dlp. How can this be reconciled with the tai-RNAi knockdown shown in Figure 7C' where reducing tai levels causes a strong reduction in Dlp levels?”

      We apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. As noted, we are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      Reviewer #3:

      Overall Weaknesses:

      “The study has relatively weak evidence for the mechanism of cell competition mediated by Dlp and Wg.”

      The screen and middle section of the paper provide genetic evidence that elevating Wg pathway activity rescues Tai[low} loser cells and that Tai controls levels/localization of Dlp and distribution of Wg in the developing wing disc. Our current work is focused on linking these two finding together in Tai “loser” clones.

      “More evidence is required to support the claim that dlp transcription or endocytosis is affected in tai clones.”

      As noted above, we are testing whether normalizing Dlp levels across clonal boundaries rescues tai[low] loser clones and assessing effects of Tai on dlp transcription and Dlp trafficking.

      Specifics:

      “Most of the rest of the study is not in the clonal context, and mainly relies on RNAi KD of tai in the posterior compartment, which is a relatively large group of cells. I understand why the authors chose a different approach to investigate the role of tai in cell competition. However because ubiquitous loss of tai results in smaller organs, it is important to determine to what extent reducing levels of tai in the entire posterior compartment compares with clonal elimination i.e. cell competition. This is important in order to determine to what extent the paradigm of Tai-mediated regulation of Dlp levels and by extension, Wg availability, can be extended as a general mechanism underlying competitive elimination of tai-low clones. If the authors want to make a case for mechanisms involved in the competitive elimination of tai clones, then they need to show that the KD of tai in the posterior compartment shows hallmarks of cell competition. Is there cell death along the A/P boundary? Or is the compartment smaller because those cells are growing slower?”

      Based on data that cell competition does not occur over compartment boundaries (e.g., see review by L.A. Johnston, Science, 2009), we chose not to use UAS-Gal4 to assess competition, but rather to investigate underlying biology occurring between Tai, Wg, and Dlp.

      “Are the levels of Myc/DIAP1, proteins required for fitness, affected in en>tai RNAi cells?”

      This is, of course, an interesting question given that Myc is a well-studied competition factor and is proposed to be downstream of the Tai-interacting protein Yki. We are not currently focused on Myc, but plan to test its role in the Tai-Dlp-Wg pathway in future work.

      “The authors do not have direct/strong evidence of changes in dlp mRNA levels or intracellular trafficking. To back these claims, the authors should look for dlp mRNA levels and provide more evidence for Dlp endocytosis like an antibody uptake assay or at the very least, a higher resolution image analysis showing a change in the number of intracellular Dlp positive punctae. Also, do the authors think that loss of tai increases Dlp endocytosis, making it less available on the cell surface for maintaining adequate extracellular Wg levels?”

      As noted above, have added experiments using a dominant-negative shibire/dynamin allele to test whether Tai controls Dlp endocytosis. These data will be added to a revised manuscript. We have also gathered reagents to test effects of Tai gain/loss on Dlp secretion.

      “The data shown in the last figure is at odds with the model (I think) the authors are trying to establish: When cells have lower Tai levels, this reduces Dlp levels (S2) presumably either by reducing dlp transcription and/or increasing (?) Dlp endocytosis. This in turn reduces Wg (availability) in cells away from source cells (Figure 6). The reduced Wg availability makes them less fit, targeting them for competitive elimination. But in tai clones, I do not see any change in cell-surface Dlp (9B) (I would have expected them to be down based on the proposed model). The authors also see more total Dlp (9A) (which is at odds with S2 assuming data in S2 were done under permeabilizing conditions.).”

      As noted above (under Rev #2 comments), we apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. We are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      “As a side note, because Dlp is GPI-anchored, the authors should consider the possibility that the 'total' Dlp staining observed in 9A may not be actually total Dlp (and possibly mostly intracellular Dlp, since the permeabilizing membranes with detergent will cause some (most?) Dlp molecules to be lost, and how this might be affecting the interpretation of the data. I think one way to address this would be to process the permeabilized and non-permeabilized samples simultaneously and then image them at the same settings and compare what membrane staining in these two conditions looks like. If membrane staining in the permeabilized condition is decreased compared to non-permeabilized conditions, and the signal intensity of Dlp in permeabilized conditions remains high, then the authors will have evidence to support increased endocytosis in tai clones. Of course, these data will still need to be reconciled with what is shown in S2.

      We thank the reviewer for this excellent suggestion and are generating mosaic discs to test the proposed approach of synchronous analysis of total vs. intracellular Dlp.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplement figure 9 which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplemental tables 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplement Tables 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplement Table 4). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplement Figure 2).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplent Figure 4. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      We have now added a new supplemental Excel file that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplemental Figure 8 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters.

      Strengths:

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation.

      Weaknesses:

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required.

      Specific comments:

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplement Figure 9.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplement Tables 1-4 and Supplement Figure 1 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      We have added a new Supplement Figure 3 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. Author response:

      eLife assessment

      This manuscript reports an important finding that the transcription factor Scleraxis regulates regenerative myogenesis by controlling the proliferation and differentiation of muscle stem cells. The evidence presented is compelling and supports the conclusions and the mechanisms by which this gene regulates satellite cell function. These data will be of interest to developmental, transcriptional, and stem cell biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript by Bai et al concerns the expression of Scleraxis (Scx) by muscle satellite cells (SCs) and the role of that gene in regenerative myogenesis. The authors report the expression of this gene associated with tendon development in satellite cells. Genetic deletion of Scx in SCs impairs muscle regeneration, and the authors provide evidence that SCs deficient in Scx are impaired in terms of population growth and cellular differentiation. Overall, this report provides evidence of the role of this gene, unexpectedly, in SC function and adult regenerative myogenesis.

      We appreciate the comments and thank her/him for the support of our manuscript.

      There are a few minor points of concern.

      (1) From the data in Figure 1, it appears that all of the SCs, assessed both in vitro and in vivo, express Scx. The authors refer to a scRNA-seq dataset from their lab and one report from mdx mouse muscle that also reveals this unexpected gene expression pattern. Has this been observed in many other scRNA-seq datasets? If not, it would be important to discuss potential explanations as to why this has not been reported previously.

      Thanks for this question regarding data in Figure 1. We did initially use immunofluorescence staining of Pax7 and GFP on muscle sections and primary myoblast cultures prepared from Tg-ScxGFP mice to conclude that Scx was expressed in satellite cells (SCs). In addition to the cited mdx RNA-seq data, we have included a re-analysis of a published scRNA-seq data set in Figure 2E (Dell'Orso, Juan et al., Development, 2019), and our own scRNA-seq data (Figure S5D, F). We have also re-examined an additional scRNA-seq data set of TA muscles at various regeneration time points (De Micheli et al., Cell Rep. 2020), in which Scx expression was detected in MuSC progenitors and mature muscle cells (in addition to tenocytes). Thus, our immunostaining results are consistent with scRNA-seq data from our and two other independent scRNA-seq data sets.

      We think that Scx expression in the adult myogenic lineage was not previously reported mainly because its expression level was low, and might be dismissed as spurious detection. Additionally, detecting such low expression levels requires sophisticated detection methods with high capture efficiency. Previous studies have noted limitations in transcript capture or transcription factor dropout in 10x Genomics-based datasets (Lambert et al., Cell, 2018; Pokhilko et al., Genome Res., 2021). Or, Scx was simply not a focus in prior studies amid other genes of interest. Our specific focus on Scx has led us to evaluate its expression in these data sets. We will add the above cited scRNA-seq data set (De Micheli et al., Cell Rep. 2020) and provide a discussion in the revised version.

      (2) A major point of the paper, as illustrated in Fig. 3, is that Scx-neg SCs fail to produce normal myofibers and renewed SCs following injury/regeneration. They mention in the text that there was no increased PCD by Caspase staining at 5 DPI. A failure of cell survival during the process of SC activation, proliferation, and cell fate determination (differentiation versus self-renewal) would explain most of the in vivo data. As such, this conclusion would seem to warrant a more detailed analysis in terms of at least one or two other time points and an independent method for detecting dead/dying cells (the in vitro data in Fig. 4F is also based on an assessment of activated Caspase to assess cell death). The in vitro data presented later in Fig. S4G, H do suggest an increase in cell loss during proliferative expansion of Scx-neg SCs. To what extent does cell loss (by whatever mechanism of cell death) explain both the in vivo findings of impaired regeneration and even the in vitro studies showing slower population expansion in the absence of Scx?

      We appreciate these constructive suggestions. Additional methods and different time points should be helpful in investigating SC cell loss in ScxcKO. Based on the number of available cKO animals, we will carefully choose additional time point(s) to assess PCD, using anti-active Caspase-3 immunostaining and another independent method (e.g., TUNNEL). Although the outcomes are uncertain, we will endeavor to obtain meaningful data from these experiments.

      (3) I'm not sure I understand the description of the data or the conclusions in the section titled "Basement membrane-myofiber interaction in control and Scx cKO mice". Is there something specific to the regeneration from Scx-neg myogenic progenitors, or would these findings be expected in any experimental condition in which myogenesis was significantly delayed, with much smaller fibers in the experimental group at 5 DPI?

      We very much appreciate this comment. We agree that there is unlikely anything specific about the regeneration from Scx-negative myogenic progenitors. Unfilled or empty ghost fibers (basement membrane remnant) are to be expected due to the small fiber and poor regeneration in the ScxcKO mice at 5 dpi. We will correct the subtitle and content accordingly.

      (4) The data presented in Fig. 4B showing differences in the purity of SC populations isolated by FACS depending on the reporter used are interesting and important for the field. The authors offer the explanation of exosomal transfer of Tdt from SCs to non-SCs. The data are consistent with this explanation, but no data are presented to support this. Are there any other explanations that the authors have considered and that could be readily tested?

      Thanks for highlighting this phenomenon. We struggled with the SC purity issue for a long time. The project started with using the R26RtdT reporter for tdT’s paraformaldehyde  resistant strong fluorescence (fixation) to aid visualization in vivo. Later, when we used the tdT signal to purify SCs by FACS, we found that only 80% sorted tdT+ cells are Pax7+. We then switched to the R26RYFP reporter, from which we achieved much higher purity (95%) of SCs (Pax7+) by FACS. As such, we also repeated and confirmed many in vivo experimental results using the R26RYFP reporter (included in the manuscript). Due to the low purity of tdT+SCs by FACS, we discontinued that mouse colony after we confirmed the superior utility of the R26RYFP reporter for SC isolation.

      We sincerely apologize for not being able to conduct further testable experiments on this intriguing phenomenon. However, this issue has since been addressed and published by Murach et al., iScience, (2021). Like our experience, they found non-satellite mononuclear cells with tdT fluorescence after TMX treatment when SCs were isolated via FACS. To determine this was not due to off-target recombination or a technical artifact from tissue processing, they conducted extensive analyses. They found that the tdT+ mononuclear cells included fibrogenic cells (fibroblasts and FAPs), immune cells/macrophages, and endothelial cells. Additionally, they confirmed the significant potential of extracellular vesicle (EV)-mediated cargo transfer, which facilitates the transfer of full-length tdT transcript from lineage-marked Pax7+ cells to those mononuclear cells. We will modify our text to include and acknowledge their contribution to this important point.

      (5) The Cut&Run data of Fig. 6 certainly provide evidence of direct Scx targets, especially since the authors used a novel knock-in strain for analyses. The enrichment of E-box motifs provides support for the 207 intersecting genes (scRNA-seq and Cut&Run) being direct targets. However, the rationale elaborated in the final paragraph of the Results section proposing how 4 of these genes account for the phenotypes on the Scx-neg cells and tissues is just speculation, however reasonable. These are not data, and these considerations would be more appropriate in the Discussion in the absence of any validation studies.

      We agree with this comment and will move this speculation into the discussion.

      Reviewer #2 (Public Review):

      Summary:

      Scx is a well-established marker for tenocytes, but the expression in myogenic-lineage cells was unexplored. In this study, the authors performed lineage-trace and scRNA-seq analyses and demonstrated that Scx is expressed in activated SCs. Further, the authors showed that Scx is essential for muscle regeneration using conditional KO mice and identified the target genes of Scx in myogenic cells, which differ from those of tendons.

      Strengths:

      Sometimes, lineage-trace experiments cause mis-expression and do not reflect the endogenous expression of the target gene. In this study, the authors carefully analyzed the unexpected expression of Scx in myogenic cells using some mouse lines and scRNA-seq data.

      We appreciate the comments and thank her/him for noting the strengths of our manuscript.

      Weaknesses:

      Scx protein expression has not been verified.

      We are aware of this weakness. We had previously used Western blotting (WB) using cultured SCs from control and ScxcKO mice, but did not detect endogenous Scx protein in the control. Hence, we used ScxCreERT2 lineage-tracing, Tg-ScxGFP expression, and ScxTy1 knock-in allele as complementary, even though indirect, ways to address this issue. Following the reviewer’s comment, we will purchase new anti-Scx antibodies and re-perform WB using cultured SCs. If the new antibodies fail to detect endogenous Scx by WB, we will then use immunofluorescence staining to detect endogenous Scx protein.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides valuable information on the mechanism of PepT2 through enhanced-sampling molecular dynamics, backed by cell-based assays, highlighting the importance of protonation of selected residues for the function of a proton-coupled oligopeptide transporter (hsPepT2). The molecular dynamics approaches are convincing, but with limitations that could be addressed in the manuscript, including lack of incorporation of a protonation coordinate in the free energy landscape, possibility of protonation of the substrate, errors with the chosen constant pH MD method for membrane proteins, dismissal of hysteresis emerging from the MEMENTO method, and the likelihood of other residues being affected by peptide binding. Some changes to the presentation could be considered, including a better description of pKa calculations and the inclusion of error bars in all PMFs. Overall, the findings will appeal to structural biologists, biochemists, and biophysicists studying membrane transporters.

      We would like to express our gratitude to the reviewers for providing their feedback on our manuscript, and also for recognising the variety of computational methods employed, the amount of sampling collected and the experimental validation undertaken. Following the individual reviewer comments, as addressed point-by-point below, we have prepared a revised manuscript, but before that we address some of the comments made above in the general assessment:

      • “lack of incorporation of a protonation coordinate in the free energy landscape”.

      We acknowledge that of course it would be highly desirable to treat protonation state changes explicitly and fully coupled to conformational changes. However, at this point in time, evaluating such a free energy landscape is not computationally feasible (especially considering that the non-reactive approach taken here already amounts to almost 1ms of total sampling time).  Previous reports in the literature tend to focus on either simpler systems or a reduced subset of a larger problem.  As we were trying to obtain information on the whole transport cycle, we decided to focus here on non-reactive methods.

      • “possibility of protonation of the substrate”.

      The reviewers are correct in pointing out this possibility, which we had not discussed explicitly in our manuscript.  Briefly, while we describe a mechanism in which protonation of only protein residues (with an unprotonated ligand) can account for driving all the necessary conformational changes of the transport cycle, there is some evidence for a further intermediate protonation site in our data (as we commented on in the first version of the manuscript as well), which may or may not be the substrate itself. A future explicit treatment of the proton movements through the transporter, when it will become computationally tractable to do so, will have to include the substrate as a possible protonation site; for the present moment, we have amended our discussion to alert the reader to the possibility that the substrate could be an intermediate to proton transport. This has repercussions for our study of the E56 pKa value, where – if protons reside with a significant population at the substrate C-terminus – our calculated shift in pKa upon substrate binding could be an overestimate, although we would qualitatively expect the direction of shift to be unaffected. However, we also anticipate that treating this potential coupling explicitly would make convergence of any CpHMD calculation impractical to achieve and thus it may be the case that for now only a semi-quantitative conclusion is all that can be obtained.

      • “errors with the chosen constant pH MD method for membrane proteins”.

      We acknowledge that – as reviewer #1 has reminded us – the AMBER implementation of hybrid-solvent CpHMD is not rigorous for membrane proteins, and as such added a cautionary note to our paper.  We also explain how the use of the ABFE thermodynamic cycle calculations helps to validate the CpHMD results in a completely orthogonal manner (we have promoted this validation, which was in the supplementary figures, into the main text in the revised version).   We therefore remain reasonably confident in the results presented with regards to the reported pKa shift of E56 upon substrate binding, and suggest that if the impact of neglecting the membrane in the implicit-solvent stage of CpHMD is significant, then there is likely an error cancellation when considering shifts induced by the incoming substrate.

      • “dismissal of hysteresis emerging from the MEMENTO method”.

      We have shown in our method design paper how the use of the MEMENTO method drastically reduces hysteresis compared to steered MD for path generation, and find this improvement again for PepT2 in this study. We address reviewer #3’s concern about our presentation on this point by revising our introduction of the MEMENTO method, as detailed in the response below.

      • “the likelihood of other residues being affected by peptide binding”.

      In this study, we have investigated in detail the involvement of several residues in proton-coupled di-peptide transport by PepT2. Short of the potential intermediate protonation site mentioned above, the set of residues we investigate form a minimal set of sorts within which the important driving forces of alternating access can be rationalised.  We have not investigated in substantial detail here the residues involved in holding the peptide in the binding site, as they are well studied in the literature and ligand promiscuity is not the problem of interest here. It remains entirely possible that further processes contribute to the mechanism of driving conformational changes by involving other residues not considered in this paper. We have now made our speculation that an ensemble of different processes may be contributing simultaneously more explicit in our revision, but do not believe any of our conclusions would be affected by this.

      As for the additional suggested changes in presentation, we provide the requested details on the CpHMD analysis. Furthermore, we use the convergence data presented separately in figures S12 and S16 to include error bars on our 1D-reprojections of the 2D-PMFs in figures 3, 4 and 5. (Note that we have opted to not do so in figures S10 and S15 which collate all 1D PMF reprojections for the OCC ↔ OF and OCC ↔ IF transitions in single reference plots, respectively, to avoid overcrowding those necessarily busy figures). We have also changed the colours schemes of these plots in our revision to improve accessibility. We have additionally taken the opportunity to fix some typos and further clarified some other statements throughout the manuscript, besides the requests from the reviewers.

      Reviewer #1 (Public Review):

      The authors have performed all-atom MD simulations to study the working mechanism of hsPepT2. It is widely accepted that conformational transitions of proton-coupled oligopeptide transporters (POTs) are linked with gating hydrogen bonds and salt bridges involving protonatable residues, whose protonation triggers gate openings. Through unbiased MD simulations, the authors identified extra-cellular (H87 and D342) and intra-cellular (E53 and E622) triggers. The authors then validated these triggers using free energy calculations (FECs) and assessed the engagement of the substrate (Ala-Phe dipeptide). The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cellbased transport assays. An alternating-access mechanism was proposed. The study was largely conducted properly, and the paper was well-organized. However, I have a couple of concerns for the authors to consider addressing.

      We would like to note here that it may be slightly misleading to the reader to state that “The linkage of substrate release with the protonation of the ExxER motif (E53 and E56) was confirmed using constant-pH molecular dynamics (CpHMD) simulations and cell-based transport assays.” The cellbased transport assays confirmed the importance of the extracellular gating trigger residues H87, S321 and D342 (as mentioned in the preceding sentence), not of the substrate-protonation link as this line might be understood to suggest.

      (1) As a proton-coupled membrane protein, the conformational dynamics of hsPepT2 are closely coupled to protonation events of gating residues. Instead of using semi-reactive methods like CpHMD or reactive methods such as reactive MD, where the coupling is accounted for, the authors opted for extensive non-reactive regular MD simulations to explore this coupling. Note that I am not criticizing the choice of methods, and I think those regular MD simulations were well-designed and conducted. But I do have two concerns.

      a) Ideally, proton-coupled conformational transitions should be modelled using a free energy landscape with two or more reaction coordinates (or CVs), with one describing the protonation event and the other describing the conformational transitions. The minimum free energy path then illustrates the reaction progress, such as OCC/H87D342-  →  OCC/H87HD342H →  OF/H87HD342H as displayed in Figure 3.

      We concur with the reviewer that the ideal way of describing the processes studied in our paper would be as a higher-dimensional free energy landscapes obtained from a simulation method that can explicitly model proton-transfer processes. Indeed, it would have been particularly interesting and potentially informative with regards to the movement of protons down into the transporter in the OF → OCC → IF sequence of transitions. As we note in our discussion on the H87→E56 proton transfer: 

      “This could be investigated using reactive MD or QM/MM simulations (both approaches have been employed for other protonation steps of prokaryotic peptide transporters, see Parker et al. (2017) and Li et al. (2022)).  However, the putative path is very long (≈ 1.7 nm between H87 and E56) and may or may not involve a large number of intermediate protonatable residues, in addition to binding site water. While such an investigation is possible in principle, it is beyond the scope of the present study.” 

      Where even sampling the proton transfer step itself in an essentially static protein conformation would be pushing the boundaries of what has been achieved in the field, we believe that considering the current state-of-the-art, a fully coupled investigation of large-scale conformational changes and proton-transfer reaction is not yet feasible in a realistic/practical time frame. We also note this limitation already when we say that:

      “The question of whether proton binding happens in OCC or OF warrants further investigation, and indeed the co-existence of several mechanisms may be plausible here”. 

      Nonetheless, we are actively exploring approaches to treat uptake and movement of protons explicitly for future work.

      In our revision, we have expanded on our discussion of the reasoning behind employing a non-reactive approach and the limitations that imposes on what questions can be answered in this study.

      Without including the protonation as a CV, the authors tried to model the free energy changes from multiple FECs using different charge states of H87 and D342. This is a practical workaround, and the conclusion drawn (the OCC→ OF transition is downhill with protonated H87 and D342) seems valid. However, I don't think the OF states with different charge states (OF/H87D342-, OF/H87HD342-, OF/H87D342H, and OF/H87HD342H) are equally stable, as plotted in Figure 3b. The concern extends to other cases like Figures 4b, S7, S10, S12, S15, and S16. While it may be appropriate to match all four OF states in the free energy plot for comparison purposes, the authors should clarify this to ensure readers are not misled.

      The reviewer is correct in their assessment that the aligning of PMFs in these figures is arbitrary; no relative free energies of the PMFs to each other can be estimated without explicit free energy calculations at least of protonation events at the end state basins. The PMFs in our figures are merely superimposed for illustrating the differences in shape between the obtained profiles in each condition, as discussed in the text, and we now make this clear in the appropriate figure captions.

      b) Regarding the substrate impact, it appears that the authors assumed fixed protonation states. I am afraid this is not necessarily the case. Variations in PepT2 stoichiometry suggest that substrates likely participate in proton transport, like the Phe-Ala (2:1) and Phe-Gln (1:1) dipeptides mentioned in the introduction. And it is not rigorous to assume that the N- and C-termini of a peptide do not protonate/deprotonate when transported. I think the authors should explicitly state that the current work and the proposed mechanism (Figure 8) are based on the assumption that the substrates do not uptake/release proton(s).

      This is indeed an assumption inherent in the current work. While we do “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change” we do not in the previous version indicate explicitly that this may involve the substrate. We make clear the assumption and this possibility in the revised version of our paper. Indeed, as we discuss, there is some evidence in our PMFs of an additional protonation site not considered thus far, which may or may not be the substrate. We now make note of this point in the revised manuscript.

      As for what information can be drawn from the given experimental stoichiometries, we note in our paper that “a 2:1 stoichiometry was reported for the neutral di-peptide D-Phe-L-Ala and 3:1 for anionic D-Phe-L-Glu. (Chen et al., 1999) Alternatively, Fei et al. (1999) have found 1:1 stoichiometries for either of D-Phe-L-Gln (neutral), D-Phe-L-Glu (anionic), and D-Phe-L-Lys (cationic).” 

      We do not assume that it is our place to arbit among the apparent discrepancies in the experimental data here, although we believe that our assumed 2:1 stoichiometry is additionally “motivated also by our computational results that indicate distinct and additive roles played by two protons in the conformational cycle mechanism”.

      (2) I have more serious concerns about the CpHMD employed in the study.

      a) The CpHMD in AMBER is not rigorous for membrane simulations. The underlying generalized Born model fails to consider the membrane environment when updating charge states. In other words, the CpHMD places a membrane protein in a water environment to judge if changes in charge states are energetically favorable. While this might not be a big issue for peripheral residues of membrane proteins, it is likely unphysical for internal residues like the ExxER motif. As I recall, the developers have never used the method to study membrane proteins themselves. The only CpHMD variant suitable for membrane proteins is the membrane-enabled hybrid-solvent CpHMD in CHARMM. While I do not expect the authors to redo their CpHMD simulations, I do hope the authors recognize the limitations of their method.

      We discuss the limitations of the AMBER CpHMD implementation in the revised version. However, despite that, we believe we have in fact provided sufficient grounds for our conclusion that substrate binding affects ExxER motif protonation in the following way.

      In addition to CpHMD simulations, we establish the same effect via ABFE calculations, where the substrate affinity is different at the E56 deprotonated vs protonated protein. This was figure S20 before, though in the revised version we have moved this piece of validation into a new panel of figure 6 in the main text, since it becomes more important with the CpHMD membrane problem in mind. Since the ABFE calculations are conducted with an all-atom representation of the lipids and the thermodynamic cycle closes well, it would appear that if the chosen CpHMD method has a systematic error of significant magnitude for this particular membrane protein system, there may be the benefit of error cancellation. While the calculated absolute pKa values may not be reliable, the difference made by substrate binding appears to be so, as judged by the orthogonal ABFE technique.

      Although the reviewer does “not expect the authors to redo their CpHMD simulations”, we consider that it may be helpful to the reader to share in this response some results from trials using the continuous, all-atom constant pH implementation that has recently become available in GROMACS (Aho et al 2022, https://pubs.acs.org/doi/10.1021/acs.jctc.2c00516) and can be used rigorously with membrane proteins, given its all-atom lipid representation.

      Unfortunately, when trying to titrate E56 in this CpHMD implementation, we found few protonationstate transitions taking place, and the system often got stuck in protonation state–local conformation coupled minima (which need to interconvert through rearrangements of the salt bridge network involving slow side-chain dihedral rotations in E53, E56 and R57). Author response image 1 shows this for the apo OF state, Author response image 2 shows how noisy attempts at pKa estimation from this data turn out to be, necessitating the use of a hybrid-solvent method.

      Author response image 1.

      All-atom CpHMD simulations of apo-OF PepT2. Red indicates protonated E56, blue is deprotonated.

      Author response image 2.

      Difficulty in calculating the E56 pKa value from the noisy all-atom CpHMD data shown in Author response image 1.

      b) It appears that the authors did not make the substrate (Ala-Phe dipeptide) protonatable in holosimulations. This oversight prevents a complete representation of ligand-induced protonation events, particularly given that the substrate ion pairs with hsPepT2 through its N- & C-termini. I believe it would be valuable for the authors to acknowledge this potential limitation. 

      In this study, we implicitly assumed from the outset that the substrate does not get protonated, which – as by way of response to the comment above – we now acknowledge explicitly. This potential limitation for the available mechanisms for proton transfer also applies to our investigation of the ExxER protonation states. In particular, a semi-grand canonical ensemble that takes into account the possibility of substrate C-terminus protonation may also sample states in which the substrate is protonated and oriented away from R57, thus leaving the ExxER salt bridge network in an apo-like state. The consequence would be that while the direction of shift in E56 pKa value will be the same, our CpHMD may overestimate its magnitude. It would thus be interesting to make the C-terminus protonatable for obtaining better quantitative estimates of the E56 pKa shift (as is indeed true in general for any other protein protonatable residue, though the effects are usually assumed to be negligible). We do note, however, that convergence of the CpHMD simulations would be much harder if the slow degree of freedom of substrate reorientation (which in our experience takes 10s to 100s of nanoseconds in this binding pocket) needs to be implicitly equilibrated upon protonation state transitions. We discuss such considerations in the revised paper.

      Reviewer #2 (Public Review):

      This is an interesting manuscript that describes a series of molecular dynamics studies on the peptide transporter PepT2 (SLC15A2). They examine, in particular, the effect on the transport cycle of protonation of various charged amino acids within the protein. They then validate their conclusions by mutating two of the residues that they predict to be critical for transport in cell-based transport assays. The study suggests a series of protonation steps that are necessary for transport to occur in Petp2. Comparison with bacterial proteins from the same family shows that while the overall architecture of the proteins and likely mechanism are similar, the residues involved in the mechanism may differ. 

      Strengths: 

      This is an interesting and rigorous study that uses various state-of-the-art molecular dynamics techniques to dissect the transport cycle of PepT2 with nearly 1ms of sampling. It gives insight into the transport mechanism, investigating how the protonation of selected residues can alter the energetic barriers between various states of the transport cycle. The authors have, in general, been very careful in their interpretation of the data. 

      Weaknesses: 

      Interestingly, they suggest that there is an additional protonation event that may take place as the protein goes from occluded to inward-facing but they have not identified this residue.

      We have indeed suggested that there may be an additional protonation site involved in the conformational cycle that we have not been able to capture, which – as we discuss in our paper – might be indicated by the shapes of the OCC ↔ IF PMFs given in Figure S15. One possibility is for this to be the substrate itself (see the response to reviewer #1 above) though within the scope of this study the precise pathway by which protons move down the transporter and the exact ordering of conformational change and proton transfer reactions remains a (partially) open question. We acknowledge this, denote it with question marks in the mechanistic overview we give in Figure 8 and also “speculate that the proton movement processes may happen as an ensemble of different mechanisms, and potentially occur contemporaneously with the conformational change”.

      Some things are a little unclear. For instance, where does the state that they have defined as occluded sit on the diagram in Figure 1a? - is it truly the occluded state as shown on the diagram or does it tend to inward- or outward-facing?

      Figure 1a is a simple schematic overview intended to show which structures of PepT2 homologues are available to use in simulations. This was not meant to be a quantitative classification of states. Nonetheless, we can note that the OCC state we derived has extra- and intracellular gate opening distances (as measured by the simple CVs defined in the methods and illustrated in Figure 2a) that indicate full gate closure at both sides. In particular, although it was derived from the IF state via biased sampling, the intracellular gate opening distance in the OCC state used for our conformational change enhanced sampling was comparable to that of the OF state (ie, full closure of the gate), see Figure S2b and the grey bars therein. Therefore, we would schematically classify the OCC state to lie at the center of the diagram in Figure 1a. Furthermore, it is largely stable over triplicates of 1 μslong unbiased MD, where in 2/3 replicates the gates remain stable, and the remaining replicate there is partial opening of the intracellular gate (as shown in Figure 2 b/c under the “apo standard” condition). We comment on this in the main text by saying that “The intracellular gate, by contrast, is more flexible than the extracellular gate even in the apo, standard protonation state”, and link it to the lower barrier for transition to IF than to OF. We did this by saying that “As for the OCC↔OF transitions, these results explain the behaviour we had previously observed in the unbiased MD of Figure 2c.” We acknowledge this was not sufficiently clear and have added details to the latter sentence to help clarify better the nature of the occluded state.

      The pKa calculations and their interpretation are a bit unclear. Firstly, it is unclear whether they are using all the data in the calculations of the histograms, or just selected data and if so on what basis was this selection done. Secondly, they dismiss the pKa calculations of E53 in the outward-facing form as not being affected by peptide binding but say that E56 is when there seems to be a similar change in profile in the histograms.

      In our manuscript, we have provided two distinct analyses of the raw CpHMD data. Firstly, we analysed the data by the replicates in which our simulations were conducted (Figure 6, shown as bar plots with mean from triplicates +/- standard deviation), where we found that only the effect on E56 protonation was distinct as lying beyond the combined error bars. This analysis uses the full amount of sampling conducted for each replicate. However, since we found that the range of pKa values estimated from 10ns/window chunks was larger than the error bars obtained from the replicate analysis (Figures S17 and S18), we sought to verify our conclusion by pooling all chunk estimates and plotting histograms (Figure S19). We recover from those the effect of substrate binding on the E56 protonation state on both the OF and OCC states. However, as the reviewer has pointed out (something we did not discuss in our original manuscript), there is a shift in the pKa of E53 of the OF state only. In fact, the trend is also apparent in the replicate-based analysis of Figure 6, though here the larger error bars overlap. In our revision, we added more details of these analyses for clarity (including more detailed figure captions regarding the data used in Figure 6) as well as a discussion of the partial effect on the E53 pKa value. 

      We do not believe, however, that our key conclusions are negatively affected. If anything, a further effect on the E53 pKa which we had not previously commented on (since we saw the evidence as weaker, pertaining to only one conformational state) would strengthen the case for an involvement of the ExxER motif in ligand coupling.

      Reviewer #3 (Public Review):

      Summary: 

      Lichtinger et al. have used an extensive set of molecular dynamics (MD) simulations to study the conformational dynamics and transport cycle of an important member of the proton-coupled oligopeptide transporters (POTs), namely SLC15A2 or PepT2. This protein is one of the most wellstudied mammalian POT transporters that provides a good model with enough insight and structural information to be studied computationally using advanced enhanced sampling methods employed in this work. The authors have used microsecond-level MD simulations, constant-PH MD, and alchemical binding free energy calculations along with cell-based transport assay measurements; however, the most important part of this work is the use of enhanced sampling techniques to study the conformational dynamics of PepT2 under different conditions. 

      The study attempts to identify links between conformational dynamics and chemical events such as proton binding, ligand-protein interactions, and intramolecular interactions. The ultimate goal is of course to understand the proton-coupled peptide and drug transport by PepT2 and homologous transporters in the solute carrier family. 

      Some of the key results include:

      (1) Protonation of H87 and D342 initiate the occluded (Occ) to the outward-facing (OF) state transition. 

      (2) In the OF state, through engaging R57, substrate entry increases the pKa value of E56 and thermodynamically facilitates the movement of protons further down. 

      (3) E622 is not only essential for peptide recognition but also its protonation facilitates substrate release and contributes to the intracellular gate opening. In addition, cell-based transport assays show that mutation of residues such as H87 and D342 significantly decreases transport activity as expected from simulations. 

      Strengths: 

      (1) This is an extensive MD-based study of PepT2, which is beyond the typical MD studies both in terms of the sheer volume of simulations as well as the advanced methodology used. The authors have not limited themselves to one approach and have appropriately combined equilibrium MD with alchemical free energy calculations, constant-pH MD, and geometry-based free energy calculations. Each of these 4 methods provides a unique insight regarding the transport mechanism of PepT2.

      (2) The authors have not limited themselves to computational work and have performed experiments as well. The cell-based transport assays clearly establish the importance of the residues that have been identified as significant contributors to the transport mechanism using simulations.

      (3) The conclusions made based on the simulations are mostly convincing and provide useful information regarding the proton pathway and the role of important residues in proton binding, protein-ligand interaction, and conformational changes.

      Weaknesses: 

      (1) Some of the statements made in the manuscript are not convincing and do not abide by the standards that are mostly followed in the manuscript. For instance, on page 4, it is stated that "the K64-D317 interaction is formed in only ≈ 70% of MD frames and therefore is unlikely to contribute much to extracellular gate stability." I do not agree that 70% is negligible. Particularly, Figure S3 does not include the time series so it is not clear whether the 30% of the time where the salt bridge is broken is in the beginning or the end of simulations. For instance, it is likely that the salt bridge is not initially present and then it forms very strongly. Of course, this is just one possible scenario but the point is that Figure S3 does not rule out the possibility of a significant role for the K64-D317 salt bridge. 

      The reviewer is right to point out that the statement and Figure S3 as they were do not adequately support our decision to exclude the K64-D317 salt-bridge in our further investigations. The violin plot shown in Figure S3, visualised as pooled data from unbiased 1 μs triplicates, did indeed not rule out a scenario where the salt bridge only formed late in our simulations (or only in some replicates), but then is stable. Therefore, in our revision, we include the appropriate time-series of the salt bridge distances, showing how K64-D317 is initially stable but then falls apart in replicate 1, and is transiently formed and disengaged across the trajectories in replicates 2 and 3. We have also remade the data for this plot as we discovered a bug in the relevant analysis script that meant the D170-K642 distance was not calculated accurately. The results are however almost identical, and our conclusions remain.

      (2) Similarly, on page 4, it is stated that "whether by protonation or mutation - the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." I do not agree with this assessment. The authors need to be aware of the limitations of this approach. Consider "WT H87-prot" and "D342A H87-prot": when D342 residue is mutated, in one out of 3 simulations, we see the opening of the gate within 1 us. When D342 residue is not mutated we do not see the opening in any of the 3 simulations within 1 us. It is quite likely that if rather than 3 we have 10 simulations or rather than 1 us we have 10 us simulations, the 0/3 to 1/3 changes significantly. I do not find this argument and conclusion compelling at all.

      If the conclusions were based on that alone, then we would agree.  However, this section of work covers merely the observations of the initial unbiased simulations which we go on to test/explore with enhanced sampling in the rest of the paper, and which then lead us to the eventual conclusions.

      Figure S5 shows the results from triplicate 1 μs-long trajectories as violin-plot histograms of the extracellular gate opening distance, also indicating the first and final frames of the trajectories as connected by an arrow for orientation – a format we chose for intuitively comparing 48 trajectories in one plot. The reviewer reads the plot correctly when they analyse the “WT H87-prot” vs “D342A H87-prot” conditions. In the former case, no spontaneous opening in unbiased MD is taking place, whereas when D342 is mutated to alanine in addition to H87 protonation, we see spontaneous transition in 1 out of 3 replicates.  However, the reviewer does not seem to interpret the statement in question in our paper (“the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed”) in the way we intended it to be understood. We merely want to note here a correlation in the unbiased dataset we collected at this stage, and indeed the one spontaneous opening in the case comparison picked out by the reviewer is in the condition where both the H87 interaction network and D342-R206 are perturbed. In noting this we do not intend to make statistically significant statements from the limited dataset. Instead, we write that “these simulations show a large amount of stochasticity and drawing clean conclusions from the data is difficult”. We do however stand by our assessment that from this limited data we can “already appreciate a possible mechanism where protons move down the transporter pore” – a hypothesis we investigate more rigorously with enhanced sampling in the rest of the paper. We have revised the section in question to make clearer that the unbiased MD is only meant to give an initial hypothesis here to be investigated in more detail in the following sections. In doing so, we also incorporate, as we had not done before, the case (not picked out by the reviewer here but concerning the same figure) of S321A & H87 prot. In the third replicate, this shows partial gate opening towards the end of the unbiased trajectory (despite D342 not being affected), highlighting further the stochastic nature that makes even clear correlative conclusions difficult to draw.

      (3) While the MEMENTO methodology is novel and interesting, the method is presented as flawless in the manuscript, which is not true at all. It is stated on Page 5 with regards to the path generated by MEMENTO that "These paths are then by definition non-hysteretic." I think this is too big of a claim to say the paths generated by MEMENTO are non-hysteretic by definition. This claim is not even mentioned in the original MEMENTO paper. What is mentioned is that linear interpolation generates a hysteresis-free path by definition. There are two important problems here: (a) MEMENTO uses the linear interpolation as an initial step but modifies the intermediates significantly later so they are no longer linearly interpolated structures and thus the path is no longer hysteresisfree; (b) a more serious problem is the attribution of by-definition hysteresis-free features to the linearly interpolated states. This is based on conflating the hysteresis-free and unique concepts. The hysteresis in MD-based enhanced sampling is related to the presence of barriers in orthogonal space. For instance, one may use a non-linear interpolation of any type and get a unique pathway, which could be substantially different from the one coming from the linear interpolation. None of these paths will be hysteresis-free necessarily once subjected to MD-based enhanced sampling techniques.

      We certainly do not intend to claim that the MEMENTO method is flawless. The concern the reviewer raises around the statement "These paths are then by definition non-hysteretic" is perhaps best addressed by a clarification of the language used and considering how MEMENTO is applied in this work. 

      Hysteresis in the most general sense denotes the dependence of a system on its history, or – more specifically – the lagging behind of the system state with regards to some physical driver (for example the external field in magnetism, whence the term originates). In the context of biased MD and enhanced sampling, hysteresis commonly denotes the phenomenon where a path created by a biased dynamics method along a certain collective variable lags behind in phase space in slow orthogonal degrees of freedom (see Figure 1 in Lichtinger and Biggin 2023, https://doi.org/10.1021/acs.jctc.3c00140). When used to generate free energy profiles, this can manifest as starting state bias, where the conformational state that was used to seed the biased dynamics appears lower in free energy than alternative states. Figure S6 shows this effect on the PepT2 system for both steered MD (heavy atom RMSD CV) + umbrella sampling (tip CV) and metadynamics (tip CV). There is, in essence, a coupled problem: without an appropriate CV (which we did not have to start with here), path generation that is required for enhanced sampling displays hysteresis, but the refinement of CVs is only feasible when paths connecting the true phase space basins of the two conformations are available. MEMENTO helps solve this issue by reconstructing protein conformations along morphing paths which perform much better than steered MD paths with respect to giving consistent free energy profiles (see Figure S7 and the validation cases in the MEMENTO paper), even if the same CV is used in umbrella sampling. 

      There are still differences between replicates in those PMFs, indicating slow conformational flexibility propagated from end-state sampling through MEMENTO. We use this to refine the CVs further with dimensionality reduction (see the Method section and Figure S8), before moving to 2D-umbrella sampling (figure 3). Here, we think, the reviewer’s point seems to bear. The MEMENTO paths are ‘non-hysteretic by definition’ with respect to given end states in the sense that they connect (by definition) the correct conformations at both end-states (unlike steered MD), which in enhanced sampling manifests as the absence of the strong starting-state bias we had previously observed (Figure S7 vs S6). They are not, however, hysteresis-free with regards to how representative of the end-state conformational flexibility the structures given to MEMENTO really were, which is where the iterative CV design and combination of several MEMENTO paths in 2D-PMFs comes in. 

      We also cannot make a direct claim about whether in the transition region the MEMENTO paths might be separated from the true (lower free energy) transition paths by slow orthogonal degrees of freedom, which may conceivably result in overestimated barrier heights separating two free energy basins. We cannot guarantee that this is not the case, but neither in our MEMENTO validation examples nor in this work have we encountered any indications of a problem here.

      We hope that the reviewer will be satisfied by our revision, where we replace the wording in question by a statement that the MEMENTO paths do not suffer from hysteresis that is otherwise incurred as a consequence of not reaching the correct target state in the biased run (in some orthogonal degrees of freedom).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Figure S1: it would be useful to label the panels.

      We have now done this.

      At the bottom of page 4, it is written that "the extracellular gate only opens spontaneously when both the H87 interaction network and D342-R206 are perturbed (Figure S5)." But it is hard to interpret that from the figure.  

      See also our response to reviewer #3. We have revised the wording of this statement, and also highlight in Figure S5 the crucial runs we are referring to, in order to make them easier to discern.

      At the bottom of page 5, and top of page 6, there is a lot of "other" information shown, which is inserted for the record - this is a bit glossed over and hard to follow.

      The “other” information refers to further conditions we had calculated PMFs for and that gave some insight, but which were secondary for drawing our key conclusions. We thank the reviewer for their feedback that this section needs clarification. We have revised this paragraph to make it easier to follow and highlight better the conclusions we draw form the data.

      In Figure 7 it looks as though the asterisks have shifted.

      We are indebted to the reviewer for spotting this error, the asterisks are indeed shifted one bar to the right of their intended position. The revised version fixes this issue.

      Reviewer #3 (Recommendations For The Authors):

      Minor points: In Figure 1a, The 7PMY label and arrow are slightly misplaced.

      Figure 1a is a schematic diagram to show the available structures of PepT2 homologues (see also the response to reviewer #2 above). The 7PMY label placement is intentional to indicate a partially occluded inwards-facing state. As we write in the figure caption: “Intermediate positions between states indicate partial gate opening”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TMC7 knockout mice were generated by the authors and the phenotype was analyzed. They found that Tmc7 is localized to Golgi and is needed for acrosome biogenesis.

      Strengths:

      The phenotype of infertility is clear, and the results of TMC7 localization and the failed acrosome formation are highly reliable. In this respect, they made a significant discovery regarding spermatogenesis.

      Weaknesses:

      There are also some concerns, which are mainly related to the molecular function of TMC7 and Figure 5.

      (1) It is understandable that TMC7 exhibits some channel activity in the Golgi and somehow affects luminal pH or Ca2+, leading to the failure of acrosome formation. On the other hand, since they are conducting the pH and calcium imaging from the cytoplasm, I do not think that the effect of TMC7 channel function in Golgi is detectable with their methods.

      We agree with the reviewer that there are no direct evidences showing the effect of TMC7 channel function in Golgi. We have changed the description in the revised manuscript.

      (2) Rather, it is more likely that they are detecting apoptotic cells that have no longer normal ion homeostasis.

      We thank the reviewer for raising this concern. We apologize for not labeling the postnatal stage in original Figure 5. We measured intracellular Ca2+, pH and ROS in PD30 testes (revised Fig. S6a-c), no apoptotic cells were observed at this stage (revised Fig. S6e, f). Apoptotic cells were found in the seminiferous tubules and cauda epididymis of 9-week-old Tmc7–/– mice (revised Fig. 5e-f). We have included TUNEL data in testis of PD21, PD30 and 9-week-old mice (revised Fig. 5e, f and Fig. S6e, f). In accordance with our findings, Tmc1 mutation has also been shown to result in reduced Ca2+ permeability, thus triggering hair cell apoptosis (Fettiplace, R, PNAS. 2022) [1].

      (3) Another concern is that n is only 3 for these imaging experiments.

      As suggested by the reviewer, more replicates were included in imaging experiments.

      Reviewer #2 (Public Review):

      Summary:

      This study presents a significant finding that enhances our understanding of spermatogenesis. TMC7 belongs to a family of transmembrane channel-like proteins (TMC1-8), primarily known for their role in the ear. Mutations to TMC1/2 are linked to deafness in humans and mice and were originally characterized as auditory mechanosensitive ion channels. However, the function of the other TMC family members remains poorly characterized. In this study, the authors begin to elucidate the function of TMC7 in acrosome biogenesis during spermatogenesis. Through analysis of transcriptomics datasets, they identify TMC7 as a transmembrane channel-like protein with elevated transcript levels in round spermatids in both mouse and human testis. They then generate Tmc7-/- mice and find that male mice exhibit smaller testes and complete infertility. Examination of different developmental stages reveals spermatogenesis defects, including reduced sperm count, elongated spermatids, and large vacuoles. Additionally, abnormal acrosome morphology is observed beginning at the early-stage Golgi phase, indicating TMC7's involvement in proacrosomal vesicle trafficking and fusion. They observed localization of TMC7 in the cis-Golgi and suggest that its presence is required for maintaining Golgi integrity, with Tmc7-/- leading to reduced intracellular Ca2+, elevated pH, and increased ROS levels, likely resulting in spermatid apoptosis. Overall, the work delineates a new function of TMC7 in spermatogenesis and the authors suggest that its ion channel activity is likely important for Golgi homeostasis. This work is of significant interest to the community and is of high quality.

      Strengths:

      The biggest strength of the paper is the phenotypic characterization of the TMC7-/- mouse model, which has clear acrosome biogenesis/spermatogenesis defects. This is the main claim of the paper and it is supported by the data that are presented.

      Weaknesses:

      The claim is that TMC7 functions as an ion channel. It is reasonable to assume this given what has been previously published on the more well-characterized TMCs (TMC1/2), but the data supporting this is preliminary here, and more needs to be done to solidify this hypothesis. The authors are careful in their interpretation and present this merely as a hypothesis supporting this idea.

      We appreciate the insightful comment. It is indeed a limitation of our study that we lack strong evidences to support that TMC7 functions as an ion channel. We have planned to conduct cellular electrophysiology in GC-1 cells heterologous expression of TMC7. However, TMC7 was trapped in the endoplasmic reticulum like TMC1 and TMC2 (Yu X, PNAS. 2020)[2], and failed to localize to the Golgi. According to the reviewer’s suggestion, we have made careful and more detailed interpretation the molecular function of TMC7 in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Wang et al. have demonstrated that TMC7, a testis-enriched multipass transmembrane protein, is essential for male reproduction in mice. Tmc7 KO male mice are sterile due to reduced sperm count and abnormal sperm morphology. TMC7 co-localizes with GM130, a cis-Golgi marker, in round spermatids. The absence of TMC7 results in reduced levels of Golgi proteins, elevated abundance of ER stress markers, as well as changes of Ca2+ and pH levels in the KO testis. However, further confirmation is required because the analyses were performed with whole testis samples in spite of the differences in the germ cell composition in WT and KO testis. In addition, the causal relationships between the reported anomalies await thorough interrogation.

      Strengths:

      The microscopic images are of great quality, all figures are properly arranged, and the entire manuscript is very easy to follow.

      Weaknesses:

      (1) Tmc7 KO male mice show multiple anomalies in sperm production and morphogenesis, such as reduced sperm count, abnormal sperm head, and deformed midpiece. Thus, it is confusing that the authors focused solely on impaired acrosome biogenesis.

      We are grateful to your comments and suggestions. We agree and have added these defects in spermiogenesis of Tmc7–/– mice in the abstract and discussion sections of revised manuscript.

      (2) Further investigations are warranted to determine whether the abnormalities reported in this manuscript (e.g., changes in protein, Ca2+, and pH levels) are directly associated with the molecular function of TMC7 or are the byproducts of partially arrested spermiogenesis. Please find additional comments in "Recommendations for the authors".

      Thank you for raising this concern. Per your comments, we have included data of intracellular Ca2+, pH and ROS in PD21 testes. The intracellular homeostasis was impaired as early as PD21, indicating TMC7 depletion impairs cellular homeostasis which in turn results in arrested spermiogenesis.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As noted by all three reviewers, current flow cytometry data does not necessarily support the 'ion channel' hypothesis, thus the phenotypic analysis is compelling but the molecular mechanism of how TMC7 facilitates acrosome biogenesis remains incomplete. It is highly recommended for the authors to at least discuss or test alternative hypotheses (as reviewer #2 suggested) such as the possibility of acting as 'lipid scramblase'. Also, the authors need to provide further explanation for other morphological defects if TMC7 is truly a functional ion channel in Golgi (and thus later at acrosome), which is also related to the key question of whether TMC7 is a functional ion channel.

      We thank the reviewing editor for the comments and suggestions. We agree that our study lack strong evidences to support that TMC7 functions as an ion channel. We have discussed the possibility of TMC7 acting as 'lipid scramblase' as suggested. We have also included data of intracellular Ca2+, pH and ROS in PD21, PD30 testes.

      Indeed, Tmc7–/– mice exhibits other defects including abnormal head morphology and disorganized mitochondrial sheaths. As TMC7 is localized to the cis-Golgi apparatus and is required for maintaining Golgi integrity. Previous studies on Golgi localized proteins including GOPC (Yao R, PNAS. 2002)[2], HRB (Kang-Decker N. Science. 2001)[3] and PICK1(Xiao N, JCI. 2009)[4] exhibit similar defects in spermiogenesis with Tmc7–/– mice. It is possible that defects morphologies in Tmc7–/– mice might be due to impaired function of Golgi.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide more details about the imaging experiments using FACS. Since they only describe catalog numbers (Beyotime, S1056, S1006, S0033S) for imaging reagents, it is not immediately clear what reagents they actually used. Since they used Fluo3, BCECF, and DCFH, it would be better to mention their names.

      Thanks. We have provided more detailed antibody information as suggested.

      (2) I am also concerned that in the FACS there is no information at all about laser wavelength and filter properties. This is especially important for BCECF because the wavelength spectrum changes with pH. Also, if there are any positive controls for these imaging reagents, such as ionophores, it would be more convincing to include them.

      Thank you for your comment. Excitation wavelength is 488nm for detecting Ca2+, pH and ROS in FACS. BCECF is the most popular pH probe to monitor cellular pH and the reagent from Beyotime (S1006) has been used by other studies (Chen S, Blood. 2016)[5], (Liu H, Cell Death Dis. 2022)[6]. To make the results more reliable, we have repeated these experiments in PD21 testes (revised Figure 5a-c). No positive controls for these reagents were used in our experiments.

      (3) As noted above, it is better to avoid directly linking the cell's abnormal ion homeostasis to TMC7 ion channel function in the text. The discussion should be changed to emphasize that the TMC7-deficient cells are apoptotic and that these physiological phenomena are occurring as a side effect of this apoptosis.

      Thank you for raising this concern. We agree with the reviewer that there are no direct evidences showing the effect of TMC7 channel function in Golgi and we have changed the description in the revised manuscript.

      We performed new experiment to measure apoptosis and intracellular Ca2+, pH and ROS in PD21 testes. No apoptotic cells were observed at this stage. However, impaired cellular homeostasis was still found in testis of PD21 Tmc7-/- mice. These data suggest that TMC7 depletion impairs cellular homeostasis and hence induces spermatid apoptosis.

      (4) While I understand that it appears to be difficult to experimentally verify the ion channel function of TMC7, it may be supportive to compare its amino acid sequence and/or 3D predicted structure with that of TMC1/2. Including a supplemental figure for this purpose would emphasize the possibility that TMC7 functions as an ion channel.

      We thank the reviewer for making this great suggestion. We compared the amino acid sequence and structure of TMC1, TMC2 with TMC7 respectively. TMC1 had 81% sequence similarity with TMC7 and the RMSD (Root Mean Square Deviation) was 3.079. TMC2 had 82% sequence similarity with TMC7, the RMSD was 2.176. These data suggest that TMC7 has similar amino acid sequence and predicted structure with TMC1/2 and might functions as an ion channel. We have included the predicted structures in revised Fig. S7.

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      I do not have any experimental comments or concerns to address, but I do ask that the authors consider an alternative hypothesis. Based on prior data demonstrating that TMC1 is a mechanosensitive ion channel, the authors reasonably assume that TMC7 may also function as an ion channel. Although the authors observe alterations in cytosolic Ca2+ and pH upon loss of TMC7 by flow cytometry, which begins to support this hypothesis, these data do not directly demonstrate ion channel activity.

      I was wondering if the authors had considered whether TMC7 could also function as a lipid scramblase. TMC1 has also been proposed to function as a Ca2+-inhibited scramblase, where knockout of TMC1 leads to a loss of phosphatidylserine (PS) exposure and membrane blebbing at the apical region of hair cells (Ballesteros, A. and Swartz, K., Science Advances, 2022). Furthermore, TMC proteins are structurally related to the Anoctamin/TMEM16 family of chloride channels and lipid scramblases, where TMEM16A-B are bona fide Ca2+-activated chloride channels, and TMEM16C-H are characterized as Ca2+-dependent scramblases. Based on their structural similarity and the observation that TMC1 may also exhibit lipid scrambling properties based on the PS exposure, I wonder if the authors may have data that support a TMC7 scramblase hypothesis. I was intrigued by this idea, especially given the authors' observations of large vacuoles in the seminiferous tubules and cauda epididymis and the vesicle accumulation phenotype in their TEM data. Incorporating this hypothesis into the discussion section, at minimum, could provide a valuable perspective, and this line of thought may lead to interesting data interpretation throughout the paper.

      We thank the reviewer for the valuable suggestion. We have discussed the possibility of TMC7 acting as 'lipid scramblase' as suggested.

      Reviewer #3 (Recommendations For The Authors):

      (1) Gene symbols should be italicized, and protein symbols should be capitalized.

      Thanks. We have made changes to the manuscript as recommended.

      (2) Tmc7 KO males show reduced sperm count, which alters the germ cell composition in the testis (Figure 2g). Thus, it is inappropriate to compare protein levels using whole testis lysates (Figure 3e, 4h, 5d, 5f). Instead, the same immunoblotting analyses could be done with purified round spermatids or 3-wk-old testis. Likewise, the significance of the intracellular Ca2+ and pH measurements is potentially diminished by the differences in the germ cell composition in WT and KO mice.

      We appreciate this constructive suggestion. We agree with the reviewer that whole testis lysates diminished the differences between WT and _Tmc7-/-_mice. However, we are unable purify round spermatids due to the lack of specific markers.

      (3) Figures 2i, 2j: How sperm motility was measured should be specified in the Methods.

      We thank you for your significant reminding and have added sperm motility assessment in Methods section.

      (4) Figure 4g: It does not make sense to compare the fluorescence intensity of these proteins without making sure that the seminiferous tubules are in the same stage. As shown in Figures S5a and S5b, TMC7 exhibits varied abundance in spermatids at different steps.

      We thank the reviewer for the insightful comment. We have replaced images in the same stage seminiferous tubules and compared the fluorescence intensity of new images as suggested.

      (5) Figure 4h: How were the band intensities measured? The third band from the left is visually stronger than the first one, but it does not seem to be so according to the column graph. The reviewer measured the intensity of GRASP65 bands relative to alpha-tubulin by ImageJ and obtained relative intensities of 0.35, 0.87, 0.6, and 0.08 for the bands from left to right. Additional replicates of the western blots should be included in the supplementary figures.

      Thank you for this insightful comment. The density and size of the blots were quantified by Image J. We have checked the first band from the left of GRASP65 and it seems that the protein was not fully transferred onto the PVDF membrane. We have performed new experiments and replaced the original bands (Revised Fig. 4h). Additional replicates of the western blots have been included in revised Fig. S8.

      (6) Figures 5a, 5b: Based on the observation of abnormal intracellular Ca2+ and pH levels in the KO germ cells, the authors concluded that TMC7 maintains the homeostasis of Golgi pH and ion (Lines 223-224, 263-264). However, intracellular Ca2+ and pH levels do not directly reflect those in the Golgi apparatus.

      We thank the reviewer for this important comment. We agree and have changed “Golgi” to “intracellular” as suggested.

      (7) Figure 5c: ROS is produced during apoptosis. Thus, it is not appropriate to conclude that the increased ROS levels in Tmc7 KO germ cells lead to apoptosis.

      According to the reviewer’s comment, we measured ROS and apoptosis in testis of PD21 and PD30 mice. ROS levels were increased, but no apoptotic cells were observed in testis of PD21 and PD30 Tmc7–/– mice. Apoptotic cells were observed in testis of 9-week-old Tmc7–/– mice (Revised Fig. 5e-f). These data suggest that TMC7 depletion results in the accumulation of ROS, thereby leads to apoptosis.

      (1) Fettiplace, R., D.N. Furness, and M. Beurg, The conductance and organization of the TMC1-containing mechanotransducer channel complex in auditory hair cells. Proc Natl Acad Sci U S A, 2022. 119(41): p. e2210849119.

      (2) Yu, X., et al., Deafness mutation D572N of TMC1 destabilizes TMC1 expression by disrupting LHFPL5 binding. Proc Natl Acad Sci U S A, 2020. 117(47): p. 29894-29903.

      (3) Kang-Decker, N., et al., Lack of acrosome formation in Hrb-deficient mice. Science, 2001. 294(5546): p. 1531-3.

      (4) Xiao, N., et al., PICK1 deficiency causes male infertility in mice by disrupting acrosome formation. J Clin Invest, 2009. 119(4): p. 802-12.

      (5) Chen, S., et al., Sympathetic stimulation facilitates thrombopoiesis by promoting megakaryocyte adhesion, migration, and proplatelet formation. Blood, 2016. 127(8): p. 1024-35.

      (6) Liu, H., et al., PRMT5 critically mediates TMAO-induced inflammatory response in vascular smooth muscle cells. Cell Death Dis, 2022. 13(4): p. 299.

    1. Author response:

      eLife assessment:

      This manuscript reports valuable findings on the role of the Srs2 protein in turning off the DNA damage signaling response initiated by Mec1 (human ATR) kinase. The data provide solid evidence that Srs2 interaction with PCNA and ensuing SUMO modification is required for checkpoint downregulation. However, experimental evidence with regard to the model that Srs2 acts at gaps after camptothecin-induced DNA damage is currently lacking. The work will be of interest to cell biologists studying genome integrity but would be strengthened by considering the possible role of Rad51 and its removal. 

      We appreciate the editors and the reviewers for providing evaluation and helpful comments. As detailed below, we plan to adjust the writing and figures to address the points raised by the reviewers. We believe that these changes will improve the clarity of the work. Below is a summary of our plan to address the two main criticisms.

      (1) Regarding the sites of Srs2 action, our data support the conclusion that Srs2 removal of RPA is favored at a subset of ssDNA regions that have proximal PCNA, but not at sites lacking PCNA. A logical supposition for the former types of ssDNA regions includes ssDNA gaps and tails generated during DNA repair or replication, wherein PCNA can be loaded at the ssDNA-dsDNA junction with a 3’ DNA end. Examples of the latter type of ssDNA regions without proximal PCNA can form within negatively supercoiling regions or intact R-loops, both of which lack 3’ DNA end for PCNA loading. While we have stated this conclusion in the text, we highlighted ssDNA gaps as sites of Srs2 action in Discussion and in the model figure, which could be misleading. We will clarify our model, that is, Srs2 distinguishes among different types of ssDNA regions using PCNA proximity as a guide for RPA removal, and state that the precise nature of Srs2 action sites remain to be determined. Regardless, the feature of Srs2 revealed in this work provides a rationale for how it can remove RPA at subsets of ssDNA regions without unnecessary stripping of RPA at other sites.

      (2) While Rad51 removal is an important facet of Srs2 functions, it is not relevant to our current study based on the following observations and rationales.

      First, we have provided several lines of evidence to support the conclusion that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, it does not affect the hyper-checkpoint phenotype of srs2∆. Strikingly, rfa1-zm1/zm2 have the opposite effect. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the srs2-_ATPase dead allele (_srs2-K41A). For example, rfa1-zm2 rescued the hyper-checkpoint defect and the CPT sensitivity of srs2-K41A, while rad51∆ had neither effect.

      These and other data described in Dhingra et al suggest that Srs2’s effects on checkpoint vs. recombination are separable and that the Srs2-RPA antagonism during the DNA damage checkpoint is independent of Rad51.

      Second, our current work addresses which Srs2 features affect the Srs2-RPA antagonism during the DNA damage response and its implications. Given this antagonism is separable from Srs2 removal of Rad51, including Rad51 regulation would be distractive from the main points of this work.

      Third, in the current work, we began by examining all known regulatory and protein-protein interaction features of Srs2, including the Rad51 binding domain. Consistent with our conclusion summarized above based on the Dhingra et al study, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Figure 2D). This is in sharp contrast to mutating the PCNA binding and the sumoylation sites of Srs2, which suppressed rfa1-zm2 for its CPT sensitivity and checkpoint abnormalities (Figure 2C). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2-RPA antagonism. 

      In summary, our work provides a foundation for future examination of how Srs2 regulates RPA and Rad51 in different manners, how these two facets of the Srs2 functions affect genome integrity in different capacity, and whether there is a crosstalk between them during certain DNA metabolism processes.

      Public Reviews:

      Reviewer #1:

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1. Noted weaknesses include the lack of evidence supporting that Srs2 binding to PCNA and its SUMOylation occur at ssDNA gaps, as proposed by the authors. Also, the mutants of Srs2 with impaired binding to PCNA or impaired SUMOylation showed no clear defects in checkpoint dampening, and in some contexts, even resulted in decreased Rad53 activation. Therefore, key parts of the paper would benefit from further experimentation and/or clarification.  

      We thank the reviewer for the positive comments on this work and address her/his remark regarding ssDNA gaps below in Major Comment #1. In addition, we detailed below our data and rationale in suggesting that the checkpoint dampening phenotype of srs2-∆PIM and -3KR (deficient for PCNA binding and sumoylation, respectively) is masked by redundant pathways. We further describe our plan to enhance the clarity of both text and model to address these points from the reviewer. 

      Major Comments 

      (1) The central model proposed by the authors relies on the loading of PCNA at the 3' junction of an ssDNA gap, which then mediates Srs2 recruitment and RPA removal. While several aspects of the model are consistent with the data, the evidence that it is occurring at ssDNA gaps is not strong. The experiments mainly used CPT, which generates mostly DSBs. The few experiments using MMS, which mostly generates ssDNA gaps, show that Srs2 mutants lead to weaker rescue in this context (Figure S1). How do the authors explain this discrepancy? In the context of DSBs, are the authors proposing that Srs2 is engaging at later steps of HRdriven DSB repair where PCNA gets loaded to promote fill-in synthesis? If so, is RPA removal at that step important for checkpoint dampening? These issues need to be addressed and the final model adjusted. 

      We appreciate the reviewer’s concern. Our conclusion is that Srs2 can be guided by PCNA to a subset of ssDNA regions for RPA removal, and that this Srs2 action is not favored at ssDNA regions with no proximal PCNA. It is important to note that CPT can produce both types of ssDNA regions. Besides ssDNA generated via DSB-associated recombinational repair, CPT can also lead to ssDNA gap formation upon excision repair and DNA-protein crosslink repair of trapped Top1 (Sun et al., 2020). ssDNA regions generated during these DNA repair processes often contain 3’ DNA end for PCNA loading, thus they can favor Srs2 removal of RPA. Another facet of CPT’s effects (besides DNA lesions) is depleting functional pool of Top1, thus causing topological stress and consequently increased levels of DNA supercoiling and R-loops (Koster et al., 2007, Petermann et al., 2022). ssDNA formed within the negatively supercoiled regions and in R-loops lacks 3’ DNA end unless it is cleaved by nucleases, thus these sites would be disfavored for Srs2 removal of RPA due to lack of PCNA loading. Our conclusion that ssDNA regions with nearby PCNA are preferred sites for Srs2 action provides a rationale for how Srs2 can remove RPA at certain ssDNA regions but minimize unnecessary stripping of RPA from other sites.

      We will clarify in Discussion that CPT can generate twp types of ssDNA regions as stated above, and that Srs2 could distinguish among them using PCNA proximity as a guide for RPA removal. While this conclusion was described in the text, we emphasized ssDNA gap as a Srs2 action site in the model. We will clarify that while this is a logical supposition, other types of ssDNAs with proximal PCNA could also be targeted by Srs2 and that our work paves the way to determine the precise nature of ssDNA regions for Srs2’s action. 

      The reasons for the less potent growth suppression of rfa1 mutants by srs2 alleles in MMS condition compared with CPT condition are unclear, but multiple possibilities should be considered, given that MMS and CPT affect checkpoint responses differently and that RPA and Srs2 affect growth in multiple ways. For example, while CPT only activates the DNA damage checkpoint, MMS additionally induces DNA replication checkpoint (Menin et al., 2018, Redon et al., 2003). It is thus possible that the Srs2-RPA antagonism is relatively more important for the DNA damage checkpoint than the DNA replication checkpoint. Further investigation of this possibility among others will shed light on differential suppressive effects seen in this work. We will include this discussion in the revised text.

      (2) The data in Figure 3 showing that Srs2 mutants reduce Rad53 activation in the rfa1-zm2 mutant are confusing, especially given the claim of an anti-checkpoint function for Srs2 (in which case Srs2 mutants should result in increased Rad53 activation). The authors propose that Rad53 is hyperactivated in rfa1-zm2 mutant because of compromised ssDNA protection and consequential DNA lesions, however, the effects sharply contrast with the central model. Are the authors proposing that in the rfa1-zm2 mutant, the compromised protection of ssDNA supersedes the checkpoint-dampening effect? Perhaps a schematic should be included in Figure 3 to depict these complexities and help the reader. The schematic could also include the compensatory dampening mechanisms like Slx4 (on that note, why not move Figure S2 to a main figure?... and even expand experiments to better characterize the compensatory mechanisms, which seem important to help understand the lack of checkpoint dampening effect in the Srs2 mutants) 

      Genetic interactions that involve partially defective alleles, multi-functional proteins, and redundant pathways are complex to comprehend. For example, a phenotype seen for the null allele may not be seen for partially defective alleles. In the context of this study, while srs2 null increased Rad53 activation (Dhingra et al., 2021), srs2-∆PIM and -3KR did not (Figure 3A-3B). However, srs2-∆PIM enhanced Rad53 activation when combined with another checkpoint dampening mutant slx4RIM, suggesting that defects of srs2-∆PIM can be compensated by Slx4 (Figure S2). Importantly, srs2-∆PIM and -3KR rescued rfa1-zm2’s checkpoint abnormality (Figure 3A3B), suggesting that Srs2 binding to PCNA and its sumoylation contribute to the Srs2-RPA antagonism in the DNA damage checkpoint response.

      A partially defective allele that impairs a specific function of a protein can be a powerful genetic tool even when it lacks a particular phenotype on its own. For example, a partially defective allele of the checkpoint protein Rad9 impairing its binding to gamma-H2A (rad9-K1088M) does not affect the G2/M checkpoint nor cause DNA damage sensitivity due to the compensation of other checkpoint factors (Hammet et al., 2007); however_, rad9-K1088M_ rescues the DNA damage sensitivity and persistent G2/M checkpoint of rtt107 and slx4 mutants, providing one of the evidences supporting a role of the Slx4-Rtt107 axis in removal of Rad9 from chromatin (via competing with Rad9 for gamma-H2A binding) (Ohouo et al., 2013).

      In order to highlight the checkpoint recovery process, the model in Figure 6 did not depict another consequence of the Srs2-RPA antagonism. In the presence of Srs2, DNA binding rfa1 mutants can lead to increased levels of DNA lesions and checkpoint, and these defects are rescued by lessening Srs2’s ability to strip RPA from DNA (Dhingra et al., 2021). We will modify the model in Figure 6 and its legend to clarify that the model depicts just one of the consequences of the Srs2 and RPA antagonism with a focus on the checkpoint recovery. We will also state these points more clearly in the Discussion. Further, a new schematic in Figure 3 as suggested by the reviewer will be added to outline the genetic relationship and interpretation. We will also follow reviewer’s suggestion to move Figure S2 to the main figures. Better characterizing the compensatory mechanisms among different checkpoint dampening pathways is very interesting but requires substantial amounts of work. While it is beyond the scope of the current study, it could be pursued in the future.

      (3) The authors should demarcate the region used for quantifying the G1 population in Figure 3B and explain the following discrepancy: By inspection of the cell cycle graph, all mutants have lower G1 peak height compared to WT (CPT 2h). However, in the quantification bar graph at the bottom, ΔPIM has higher G1 population than the WT. 

      We have added the description on how the G1 region of the FACS histogram was selected to derive the percentage of G1 cells in Figure 3B. Briefly, for samples collected for a particular strain, the G1 region of the “G1 sample” was used to demarcate the G1 region of the “CPT 2h” sample. Upon re-checking the included FACS profiles, we realized that a mutant panel and its datapoint were mistakenly put in the place for wild-type. We will correct this mistake. The conclusion remains that srs2-∆PIM and srs2-3KR improved rfa1-zm2 cells’ ability to exit G2/M, while they themselves do not show difference from the wild-type control for the percentage of G1 cells after 2hr CPT treatment. We will add statistics in figures to reflect this conclusion and adjust the order of strains shown in panel A and B to be consistent with each other.

      Reviewer #2:

      This is an interesting paper that delves into the post-translational modifications of the yeast Srs2 helicase and proteins with which it interacts in coping with DNA damage. The authors use mutants in some interaction domains with RPA and Srs2 to argue for a model in which there is a balance between RPA binding to ssDNA and Srs2's removal of RPA. The idea that a checkpoint is being regulated is based on observing Rad53 and Rad9 phosphorylation (so there are the attributes of a checkpoint), but evidence of cell cycle arrest is lacking. The only apparent delay in the cell cycle is the re-entry into the second S phase (but it could be an exit from G2/M); but in any case, the wild-type cells enter the next cell cycle most rapidly. No direct measurement of RPA residence is presented. 

      We thank the reviewer for the helpful comments. Previous studies have shown that CPT does not induce the DNA replication checkpoint, thus it does not slow down or arrest S phase progression; however, CPT does induce the DNA damage checkpoint, which causes a delay of G2/M cells to re-enter into the second cell cycle (Menin et al., 2018, Redon et al., 2003). Our result is consistent with previous findings, showing that CPT induces G2/M delay but not arrest. We will adjust the text to make this point clearer.

      We have previously reported chromatin-bound RPA levels in rfa1-zm2, srs2, and their double mutants, as well as in vitro ssDNA binding by wild-type and mutant RPA complexes (Dhingra et al., 2021). We found that Srs2 loss or its ATPase dead mutant led to 4-6 fold increase of RPA levels on chromatin, which was rescued by rfa1-zm2 (Dhingra et al., 2021). On its own, rfa1-zm2 did not cause defective chromatin association in our assays, despite modestly reducing ssDNA binding in vitro (Dhingra et al., 2021). This discrepancy could be due to a lack of sensitivity of chromatin fractionation assay in revealing moderate changes of RPA residence on DNA. Considering this, we decided to employ functional assays (Figure 2-3) that are more effective in identifying the Srs2 features pertaining to RPA regulation. 

      Strengths:

      Data concern viability assays in the presence of camptothecin and in the post-translational modifications of Srs2 and other proteins.

      Weaknesses:

      There are a couple of overriding questions about the results, which appear technically excellent. Clearly, there is an Srs2-dependent repair process here, in the presence of camptothecin, but is it a consequence of replication fork stalling or chromosome breakage? Is repair Rad51-dependent, and if so, is Srs2 displacing RPA or removing Rad51 or both? If RPA is removed quickly what takes its place, and will the removal of RPA result in lower DDC1-MEC1 signaling? 

      While Srs2 can affect both the checkpoint response and DNA repair in CPT conditions, the rfa1-zm2 allele, which affects the former but not the latter, role of Srs2, allows us to gain a deeper understanding of the former role (Dhingra et al., 2021). This role also appears to be critical for cell survival in CPT, since srs2∆ growth on CPT-containing media was greatly improved by rfa1-zm mutants (Dhingra et al., 2021). Building on this understanding, our current study identified two Srs2 features that could afford spatial and temporal regulations of RPA removal from DNA, thus providing a rationale for how cells can properly utilize this beneficial yet also dangerous activity. Study of Srs2-mediated repair in CPT conditions, either in Rad51-dependent or independent manner, before and after replication forks stall or DNA breaks, will require substantial efforts and can be pursued in the future. We will add this point to the revised manuscript.

      Moreover, it is worth noting that in single-strand annealing, which is ostensibly Rad51 independent, a defect in completing repair and assuring viability is Srs2-dependent, but this defect is suppressed by deleting Rad51. Does deleting Rad51 have an effect here? 

      We have shown in our previous paper (Dhingra et al., 2021). that rad51∆ did not rescue the hyper-checkpoint phenotype of srs2∆ cells in CPT condition (Dhingra et al., 2021), while rfa1-zm1 and -zm2 did (Dhingra et al., 2021). Such differential effects were also seen for the srs2 ATPase-dead allele (Dhingra et al., 2021). These and other data described in the Dhingra et al paper suggest that Srs2’s effects on checkpoint vs. recombination are separable at least in CPT condition, and that the Srs2-RPA antagonism in checkpoint regulation is not affected by Rad51 removal (unlike in SSA situation).

      Neither this paper nor the preceding one makes clear what really is the consequence of having a weakerbinding Rfa1 mutant. Is DSB repair altered? Neither CPT nor MMS are necessarily good substitutes for some true DSB assay. 

      In our previous report (Dhingra et al., 2021), we showed that the rfa1-zm mutants did not affect the frequencies of rDNA recombination, gene conversation, or direct repeat repair (Dhingra et al., 2021). Further, rfa1-zm mutants did not suppress the hyper-recombination phenotype of srs2∆, while rad51∆ did (Dhingra et al., 2021). In a DSB system, wherein the direct repeats flanking the break were placed 30 kb away from each other, srs2∆ led to hyper-checkpoint and lethality, both of which were rescued by rfa1-zm mutants (Dhingra et al., 2021). In this assay, rfa1-zm mutants themselves did not show sensitivity, suggesting the repair is largely proficient. Collectively, these data provide evidence to suggest that weaker DNA binding of Rfa1 does not have detectable effect on the recombinational repair assays examined thus far, rather it has a profound effect in Srs2-mediated checkpoint downregulation. In-depth studies of rfa1-zm mutations in the context of various DSB repair steps will be interesting to pursue in the future.

      With camptothecin, in the absence of site-specific damage, it is difficult to test these questions directly. (Perhaps there is a way to assess the total amount of RPA bound, but ongoing replication may obscure such a measurement). It should be possible to assess how CPT treatment in various genetic backgrounds affects the duration of Mec1/Rad53-dependent checkpoint arrest, but more than a FACS profile would be required. 

      Quantitative measurement of RPA residence time on DNA in cells and the duration of Mec1/Rad53-dependent checkpoint arrest will be very informative but requires further technology development. Our current work provides a foundation for such quantitative assessment.

      It is also notable that MMS treatment does not seem to yield similar results (Fig. S1). 

      Figure S1 showed that srs2-∆PIM and srs2-3KR had weaker suppression of rfa1-zm2 growth on MMS plates than on CPT plates. The reasons for the less potent growth suppression in MMS condition compared with CPT condition are unclear, but multiple possibilities should be considered, given that MMS and CPT affect checkpoint responses differently and that RPA and Srs2 affect growth in multiple ways. For example, while CPT only activates the DNA damage checkpoint, MMS additionally induces DNA replication checkpoint (Menin et al., 2018, Redon et al., 2003). It is thus possible that the Srs2-RPA antagonism is more important for the DNA damage checkpoint than the DNA replication checkpoint. Further investigation of this and other possibilities will provide clues to the differential suppressive effects seen in this work. We will include this discussion in the revised text.

      Reviewer #3:

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2. 

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. The double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of these mutants in a RFA1 wild-type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. These data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths 

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights. 

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data. 

      We thank the reviewer for carefully reading our work and for his/her positive comments. 

      Weaknesses 

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613). 

      srs2-Y775A was shown to be proficient for stripping RPA from ssDNA and behaved like wild-type Srs2 in assays such as gene conversion and crossover control, and exhibited a genetic interaction profile as the wildtype allele. The authors suggest that the Y775 pin can contribute to unwinding secondary DNA structures. Collectively, these findings do not provide a strong rationale for srs2-Y775A being relevant for RPA removal from ssDNA. 

      We have already included the data showing that a srs2 mutant lacking the Rad51 binding domain (srs2-∆Rad51BD, ∆875-902) did not affect rfa1-zm2 growth in CPT nor caused other defects in CPT on its own (Figure 2D). This data suggest that Rad51 binding is not relevant to the Srs2-RPA antagonism in CPT, a conclusion fully supported by data in our previous study (Dhingra et al., 2021). Collectively, these findings do not provide a strong rationale to test a point mutation within the Rad51BD region. 

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications. 

      We agree with this view generally. However, this is less of a concern for the Rad51 binding site mutant (srs2∆Rad51BD), as it behaved as the wild-type allele in our assays. The srs2-∆PIM mutant (lacking 4 amino acids) has been examined for PCNA binding in vitro and in vivo in several studies (e.g. Kolesar et al., 2016, Kolesar et al., 2012); to our knowledge no unintended defect was reported. We thus believe that this allele is suitable for testing whether Srs2’s ability to bind PCNA is relevant to RPA regulation.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results. 

      We will include individual data points as suggested and correct figure legend to indicate that three independent biological samples per genotype were examined in both panels.

      References:

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E and Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin Proc Natl Acad Sci U S A 118

      Hammet A, Magill C, Heierhorst J and Jackson SP (2007) Rad9 BRCT domain interaction with phosphorylated H2AX regulates the G1 checkpoint in budding yeast EMBO Rep 8: 851-857

      Kolesar P, Altmannova V, Silva S, Lisby M and Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction J Biol Chem 291: 7594-7607

      Kolesar P, Sarangi P, Altmannova V, Zhao X and Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation Nucleic Acids Res 40: 7831-7843

      Koster DA, Palle K, Bot ES, Bjornsti MA and Dekker NH (2007) Antitumour drugs impede DNA uncoiling by topoisomerase I Nature

      448: 213-217

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP and Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after topoisomerase poisoning EMBO Rep 19

      Ohouo PY, Bastos De Oliveira FM, Liu Y, Ma CJ and Smolka MB (2013) DNA-repair scaffolds dampen checkpoint signalling by counteracting the adaptor Rad9 Nature 493: 120-124

      Petermann E, Lan L and Zou L (2022) Sources, resolution and physiological relevance of R-loops and RNA-DNA hybrids Nat Rev Mol Cell Biol 23: 521-540

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF and Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage EMBO Rep 4: 678-684

      Sun Y, Saha S, Wang W, Saha LK, Huang SN and Pommier Y (2020) Excision repair of topoisomerase DNA-protein crosslinks (TOP-

      DPC). DNA Repair 89: 102837

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors use point light displays to measure biological motion (BM) perception in children (mean = 9 years) with and without ADHD, and relate it to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three BM tasks, but that those tasks loading more heavily on local processing relate to social interaction skills and those loading on global processing relate to age. There are still some elements of the results that are unclear, but nevertheless, the important and solid findings extend our limited knowledge of BM perception in ADHD, as well as biological motion processing mechanisms in general.

      We thank the editors and reviewers for their valuable feedback and constructive comments. In the revised manuscript, we have incorporated all statistics for the models and also provided detailed analytical evidence about the distinct contributions of local and global BM processing. We hope these clarifications could enhance the robustness of our conclusions.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the your positive feedback very much.

      Weaknesses:

      The manuscript has improved in clarity and conceptual and methodological considerations in response to the last review. However, the reported results still provide incomplete support for the claims the authors make in the paper.

      In relation to other reviewers' earlier comments, the model notation used is still not consistent and model results are reported incompletely, which make it difficult to gain a full picture of the data and how they support the authors' secondary claims. For instance, across the models in the supplementary materials, ß coefficients are only reported selectively which makes it difficult to assess the model as a whole. Furthermore, different terms (task 1, task 2 vs. BM-Local, BM-global) are used to refer to the same levels of a variable, and it is unclear which levels of a dummy variable correspond to which task, making it overall very difficult to comprehend the modelling procedure.

      Thanks for pointing out these issues. In the revised version, we have unified the terminology by consistently referring to task types as BM-Local, BM-Global, BM-General. Additionally, we have provided clarification on the interpretation of dummy variables in relation to model construction. Furthermore, we corrected the model results and included all statistics in Table S1, S2, and S3. For more detailed information, please refer to the response to your Recommendations for the authors.

      Reviewer #3 (Public Review):

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities, and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The valuable and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, the correlations present a pattern that needs further examination in future studies because many of the differences between correlations are not significant.

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      Thanks for this positive assessment of our work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test this further. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      We apologize for not clarifying these points earlier. We did identify correlations between performance on all BM tasks and SRS scores. However, it is noteworthy that this finding is not unexpected, given the significant distinctions in SRS scores between TD and ADHD children, alongside their marked differences in all BM tasks. Correlation analyses involving data from both groups may reflect group differences. To elucidate the relationship between social ability impairment and diminished BM processing in children with ADHD, we conducted additional subgroup analyses and found correlations only in the BM-local task. To further support the specificity of this correlation, we compared the differences in coefficients. We revised our modelling procedure for testing differences between correlations in supplementary materials and presented all models statistics in Table S2, S3. Discrepancies in these coefficients, which exclude the influence of differences between groups, suggest that social factors specifically influence the performance of the BM-Local task in children with ADHD. We acknowledge that the analysis for differences between correlations is based on a relative small sample size and provided modest interpretation in discussion. Future studies will aim to increase the sample size to validate our findings.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that may have been made in previous studies, but would appear controversial and not definitive. E.g., that local BM processing does not improve with age and is uninfluenced by attention.

      Thanks for your comment. To the best of our knowledge, there have been fewer developmental studies conducted on local BM processing compared to global BM processing. Our study is the first one to directly explore the relationship between local BM processing and age. Additionally, we used QbInattention to evaluate sustained attention function (considered as “top-down” attention) and examined its correlation with local BM processing. Some indirect evidence supported that the ability to process local BM cues remained stable and was unaffected by top-down attention. For example, local BM processing did not show a learning trend (Chang 2009) and was linked to the activation of subcortical regions (Hirai 2020). Research has demonstrated that local BM cues can convey information about walking direction without participants’ explicit attention or recognition (Chang 2009, Hirai 2011, Thompson 2007, Wang 2010), indicating the involvement of “bottom-up” processing (Hirai 2020, Troje 2023). Consistent with previous findings, we did not find significant correlation between local BM processing and age or QbInattention. We acknowledge that the statement such as “local BM processing does not improve with age and is uninfluenced by attention” should be approached with cautions. Therefore, we interpreted our results carefully:

      “Once a living creature is detected, an agent (i.e., is it a human?) can be recognised by a coherent, articulated body structure that is perceptually organised based on its motions (i.e., local BM cues)71. This involves top-down processing and probably requires attention25,72, particularly in the presence of competing information26. Our findings are consistent with those of previous studies on the cortical processing of BM73, as we found that the severity of inattention in children with ADHD was negatively correlated with their performance in global BM processing, whereas this significant correlation was not found in local BM processing, which may involve bottom-up processing61,65 and might not need participants’ explicit attention21,23,74,75. However, further studies are needed to verify this hypothesis.” (lines 461-470)

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Supplementary materials: For all reported results, I suggest the authors use consistent model notation with complete reporting of all statistics in line with common conventions (ideally tables reporting beta values, error terms and confidence intervals for all model predictors, as well as R squared values). In particular the beta values for the reference category are needed to be able to fully interpret the beta values for the reported contrasts.

      We appreciate the your suggestion. In the newly revised manuscript, we reported all statistics including beta values, error terms and confidence intervals for all model predictors, and R squared values. These detailed statistics can be found in Table S1, S2 and S3. We hope this additional information will offer readers a more comprehensive understanding of our study.

      Please also address the following inconsistencies:

      - At least when reporting the model results, the same term should be used when refering to task type (either task 1/2/3/ or local/global/general BM).

      Thank the your for this feedback. We use the same term (BM-Local/Global/General) to refer to task type in the whole text.

      - Second linear model in the Supplementary Materials: The authors state that the results suggest that the correlation between SRS and task 1 is greater than that between task 2 and SRS scores. First of all, to be able to support this claim the authors need to provide the coefficient for task 1 (which, if task 1 is the reference variable should be ß1). Second, as I currently understand the reported model results, the fact that ß4 (representing the difference in relationship to SRS scores between task 2 and task 1; the authors refer to ß3 here although I assume they mean ß4) is negative and shows a trend towards significance would actually mean the relationship between BM processing accuracy and SRS scores is more negative for task 2 relative to task 1 and not, as the authors state, that the correlation with SRS scores is greater for task 1. I realise this contradicts the individual r values and scatter plots and hope the authors can clarify the model results.

      We thank you for pointing out these issues. For the second linear model (Model 4 in revised manuscript), we reported the coefficients for all predictors and model summaries including the coefficient for task 1 (ß1). In addition, we have made correction to the model results. The values of ß4 (representing the difference in relationship to SRS scores between BM-Global and BM-Local) and ß5 (representing the difference in relationship to SRS scores between BM-General and BM-Local) were positive and showed a trend towards significance, indicating that the correlations with SRS total score were more negative for BM-Local relative to BM-Global and BM-General:

      “A general linear model was constructed (Table S2, Model 4): SRS = β0 + β1 * ACC + β2 * D1 + β3 * D2 + β4 * (ACC * D1) + β5 * (ACC * D2). If the effect of the interaction term (i.e., β4 or β5 ) is statistically significant, it indicates a difference in correlations with SRS total score between BM-Local and BM-Global (or BM-General). The results suggested trends where the correlations with SRS total score were more negative for BM-Local relative to BM-Global (standardized β4 \= 0.580 p = 0.074) and BM-General (standardized β5 = 0.550 p = 0.073).” (lines SI 36-42)

      - Third linear model in the Supplementary Materials: In the dummy variable representing task, when local BM is the reference level, which task is represented by d1 and d2, respectively? If I understand the authors' procedure correctly, d1 should represent the difference between local and global BM and d2 the difference between local and general BM. If this is true, ß4 should code for the difference between local and global BM and not, as stated by the authors, for the difference between local and general BM. Also, what is d3?

      Thank you for pointing out this issue. We corrected and clarified the results of third model (Model 5 in revised manuscript) in the revised version and pointed out what is represented by d1 (D1) and d2 (D2), respectively:

      “We recoded task types into two dummy variables, D1 and D2, using BM-Local as a reference. The coefficient of D1 represents the difference in relationship to age between BM-Local and BM-Global, and the coefficient of D2 represents the difference in relationship to age between BM-Local and BM-General. The following model was created for each group (Table S3, Model 5-6): ACC = β0 + β1 * age + β2 * D1 + β3 * D2 + β4 * (age * D1) + β5 * (age * D2). If the effect of the interaction term (i.e., β4 or β5) is statistically significant, it indicates a difference in the effect of age on ACC between BM-Local and BM-Global (or BM-General). In the ADHD group, we observed a significant difference in the effect of age on ACC between BM-Local and BM-General (standardized β5 \= 0.462, p < 0.001) and marginally significant differences in the effect of age on ACC between BM-Local and BM-Global (standardized β4 \= 0.228, p = 0.073).” (lines SI 47-57)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 3:

      Response to authors' revisions:

      This reviewer is not convinced that the authors have done enough to satisfactorily address either of the major issues described in the original public review, above.

      They're still not providing a quantification of Fig. 5D (originally 5C).

      Their response regarding the expression pattern of Rh1 is particularly concerning, as it represents a misinterpretation of previously published data.

      The gene encoding Rh1, ninaE, is expressed at such high levels in R1-6 PRs that any RNA-seq data (bulk or single-cell) generated from the optic lobes, no matter what cell-type, will display some ninaE transcripts that are present in the background, as they leak from R1-6 during dissociation steps. This phenomenon has been well described, for instance in Davis et al., 2020, eLife, and in fact led to the development of computational tools to abate such artifacts. In other words: no, rh1 is not expressed in glia, or any other neuron besides PRs for that matter. Therefore, I remain deeply suspicious about the functional relevance of the regulatory mechanisms described in this paper.

      We thank the reviewer for her or his critical comments.

      We quantified the cell-type differences in translation of the reporter with Tub-GAL4 and now show the results in Figure 5F. Consistent with other results, this analysis revealed that the glia-to-neuron ratio of the reporter protein expression is significantly lower when it contains the UTR sequences of rh1.  

      We removed the mRNA counts (former Figure 5A and Figure 5 - figure supplement 1A), as we agree that these may well be contaminated by the very high rh1 expression in R1-6. We also amended the graph showing the ribosome distribution on the rh1 mRNA (Figure 5B) to better compare the translational efficiency (footprints normalized with mRNA, in a similar manner to Figure 3C). Now it clearly highlights the cell-type differences of footprint distributions; ribosomes are much more enriched on the CDS (being translated) in neurons, while the fraction of ribosomes on the 5ʹ leader (being stalled) is much higher in glia. We summarized this differential ribosome distribution in a new graph (now Figure 5C).  

      We apologize for the misleading description of the reporter experiments. Despite the high level of mRNA expression in the R1-6, we chose the 5ʹ leader of rh1 for the translation reporter, as it contains clear uORFs and differential ribosome accumulation thereon (Figure 5B). This biased ribosome distribution and differential translation are the consistent features for many neuronal genes (Figure 3). We revised the text to clarify this point (Line 195-203).

      In summary, we provide more rigorous analysis and extensive revision, which we hope clarified the concern.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

      Excellent suggestions. The EM imaging indeed revealed an increase in enlarged cellular vesicles containing various contents in usp-50 mutants. However, the detailed molecular features of these vesicles remain unclear. Therefore, we plan to utilize ESCRT components for double staining with early or late endosome markers. This will enable us to accurately characterize the anomalous structures detected in the usp-50 mutants.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion.

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model.

      Excellent point. We plan to conduct additional genetic analyses, including the construction of double mutants between usp-50 and various rabex-5 mutations, to further elucidate the extent to which USP8 regulates endosome maturation via Rabex5.

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation.

      Weaknesses:

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript.

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it.

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. To elucidate the underlying mechanisms, we investigated the formation of multivesicular bodies (MVBs), a process tightly linked to USP8 function. Extensive electron microscopy (EM) analysis indicated that MVB-like structures are largely intact in usp-50 mutant cells, suggesting that USP8/USP-50 likely regulate lysosome formation through alternative pathways in addition to their roles in MVB formation and ESCRT component function. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Interestingly, loss-of-function mutations in usp8 often lead to the enlargement of early endosomes, yet the mechanisms underlying this phenomenon remain unclear. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged MVB-like vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This study makes an interesting finding: a polyunsaturated fatty acid, Lin-Glycine, increases the conductance of KCNQ1/KCNE1 channels by stabilizing a state of the selectivity filter that allows K+ conduction. The stabilization of a conducting state appears well supported by single-channel analysis, though some method details are missing. The linkage to PUFA action through the selectivity filter is supported by the disruption of PUFA effects by mutation of residues which change conformation in two KCNQ1 structures from the literature. Claims about differences in Lin-Glycine binding to these two structural conformations seem to lack clear support, thus the claim seems speculative that PUFAs increase Gmax by binding to a crevice in the pore domain. A potentially definitive functional experiment is conducted by single-channel recordings with selectivity filter domain mutation Y315F which ablates the Lin-Glycine effect on Gmax. However, this appears to be an n=1 experiment. Overall, the major claim of the abstract is supported: "... that the selectivity filter in KCNQ1 is normally unstable ... and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state." However, the claim in the abstract that selectivity filter instability "explains the low open probability" seems too general.

      We thank the reviewer for the comments, and we would like to address the main concern regarding the single channels. We now state the number of experiments used for the single channel analysis. We agree that the claim in the abstract seems too general and we now made it more specific to our findings.

      Reviewer #2 (Public Review):

      Golluscio et al. address one of the mechanisms of IKs (KCNQ1/KCNE1) channel upregulation by polyunsaturated fatty acids (PUFA). PUFA is known to upregulate KCNQ1 and KCNQ1/KCNE1 channels by two mechanisms: one shifts the voltage dependence to the negative direction, and the other increases the maximum conductance (Gmax). While the first mechanism is known to affect the voltage sensor equilibrium by charge effect, the second mechanism is less known. By applying the single-channel recordings and mutagenesis on the putative binding sites (most of them related to the selectivity filter), they concluded that the selectivity filter is stabilized to a conductive state by PUFA binding.

      Strengths:

      They mainly used single-channel recordings and directly assessed the behavior of the selectivity filter. The method is straightforward and convincing enough to support their claims.

      Weaknesses:

      The structural model they used is the KCNQ1 channel without KCNE1 because KCNQ1/KCNE1 channel complex is not available yet. As the binding site of PUFAs might overlap with KCNE1, it is not very clear how PUFA binds to the KCNQ1 channel in the presence of KCNE1.

      Using other previous PUFA-related KCNQ1 mutants will strengthen their conclusions. For example, the Gmax of the K326E mutant is reduced by PUFA binding. Examining whether K326E shows reduced numbers of non-empty sweeps in the single-channel recordings will be a good addition.

      We thank the reviewer for the public review. We would like to address the main weak points of the comments. As a structure of KCNQ1/KCNE1 in complex is not available yet, we used KCNQ1 alone. We believe that the PUFA and KCNE1 binding sites will not overlap as we previously presented data in agreement with the idea that KCNE1 rotates the VSD relative the PD (Wu et al., 2021). This would leave enough space for both PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301) without competing with KCNE1.  We appreciate the suggestion of adding single-channel recordings of K326E mutant and we agree it would make a valuable addition to strengthen our conclusions. However, single channel recordings for KCNQ1 are very challenging and time consuming to obtain, so we would like to keep this in consideration for future studies.

      Reviewer #3 (Public Review):

      This manuscript reveals an important mechanism of KCNQ1/IKs channel gating such that the open state of the pore is unstable and undergoes intermittent closed and open conformations. PUFA enhances the maximum open probability of IKs by binding to a crevice adjacent to the pore and stabilizing the open conformation. This mechanism is supported by convincing single-channel recordings that show empty and open channel traces and the ratio of such traces is affected by PUFA. In addition, mutations of the pore residues alter PUFA effects, convincingly supporting that PUFA alters the interactions among these pore residues.

      Strengths:

      The data are of high quality and the description is clear.

      Weaknesses:

      Some comments about the presentation.

      (1) The structural illustrations in this manuscript in general need to be more clarified.

      (2) The manuscript heavily relies on the comparison between the S4-down and S4-up structures (Figures 3, 4, and 7) to illustrate the difference between the extracellular side of the pore and to lead to the hypothesis of open-state stability being affected by PUFA. This may mislead the readers to think that the closed conformation of the channel in the up-state is the same as that in the down-state.

      We thank the reviewer for the public review, and we would like to address the comments about the presentation. We agree that the structural illustrations need to be more detailed, and we amended our previous illustrations. We have now included a new Figure 3 with a more detailed legend and a new Figure 4 that includes more information, such as the main chain of the whole selectivity filter and surrounding peptide.

      We have now added some clarification regarding the structures of KCNQ1 with S4-down and S4-up to clarify that the closed conformation of the channel in the up-state is different from that in the down-state. We also emphasize this difference in the Discussion.

      Recommendations for the authors:

      Reviewer #1:

      (1) Explain more thoroughly how the single-channel recordings were done:

      - How was Lin-Glycine applied in these experiments? The patch configuration is unclear. Was Lin-Glycine added to the patch pipette? If not, why is Lin-Glycine expected to reach the proposed binding site in the outer leaflet? Were controls time-matched applications of vehicles with ethanol?

      Data were collected using the cell attached patch configuration to minimize disruption to the patch and avoid rundown problems due to the loss of PIP2. Lin-Glycine was solubilized in DMSO and the desired concentration was added directly to the bath. We had no a priori reason to know if the PUFA would reach the proposed binding site but the consistency at which there was an increase in channel activity 5-10 minutes after addition to the bath convinced us that it was indeed reaching the binding site. This time frame fits with our prior experience with mefenamic acid effects on single channels (Wang et al 2020). The mefenamic acid binding site is external to the membrane so the drug must enter the cell and cross the patch membrane to affect channel activity. In addition, shown below is a previous recording from our lab, where nothing was added to the bath over a 55-minute time while recording consecutive files.  This shows the typical behavior of IKs, with activity tending to cluster with a few active sweeps in between many blank sweeps.  The behavior in this patch contrasts with that seen in the presence of Lin-glycine, where the clusters of activity spread over an increasing number of sweeps.

      In addition, we have previously shown that 0.1% DMSO (concentration used in the present study) does not affect the GV of KCNQ1 but there is a non-significant decrease in tail current amplitudes of about 14% (Eldstrom et al., 2021). As such we do not think that the effects we see with Lin-Glycine, with an increase in activity can be explained by vehicle effects alone.

      Author response image 1.

       

      We added some more details in the section Material and Method.

      - How well the replicates match the representative data in Figures 1, S1, and 6 is unclear (except for average current and Po in the last second of the traces from Figure 1). Are the results in Fig 6 n=1? 

      We now show in a data supplement that 3 replicates were used to access the change in channel activity upon addition of Lin-glycine.

      - Diary plots (as in Werry et al. 2013) and additional descriptions of the timeline of Lin-Glycine application and analyses could add credibility to interpretations. 

      We added a Diary plot of for the First latency to open in Supplementary Figure S1.

      - Amounts of plasmids and lipofectamine that were used in transfections are missing. 

      We added the information in Material and Method section as follow:

      “Single channel currents were recorded from transiently transfected mouse ltk- fibroblast cells (LM cells) using 1.5 mL Lipofectamine 2000 (Thermo Fisher Scientific). Cells were transfected with 1.5 mg of pcDNA3 containing a linked KCNE1-KCNQ1 construct 20, to ensure fully KCNE1-saturated complexes, in addition to a plasmid containing green fluorescent protein (GFP) to identify transfected cells”

      - Inclusion/exclusion criteria for patches analyzed are missing. 

      We added the information in Material and Method section as follow:

      “Only patches that were largely free of endogenous currents and had few channels, such that there were several blank sweeps to average for use for leak subtraction, were analyzed.”

      - Whether blinding, randomization, or pre-determined n values were employed is not mentioned. 

      No blinding, randomization or pre-determined n values were employed.

      - Analysis methods are sometimes unclear: How was Po calculated? Representative sweeps appear to have been leak and capacitance subtracted. How was that done? 

      Po was estimated from all-point amplitude histogram as follow: Po = Sum (iN/(iestimateNtotal), where N is the number of points for a specific current i in the histogram, iestimate = 0.4 pA from the peak of the histogram, and Ntotal = 10,000 is the total number of points in the last second of the trace. p = 0.75 ± 0.12 (n = 8) and p = 0.87 ± 0.04 (n = 3) for Control and Lin-Glycine, respectively.

      Leak and capacitance were subtracted with averaged empty sweeps.

      (2) The change of cells used for whole cell vs single channel (oocytes vs mouse ltk- fibroblast cells) could be discussed. These cells likely have different lipids in their membranes. Is there any other evidence that PUFAs have the same effects on KCNE1-KCNQ1 in these cells? Does the V0.5 shift? 

      A similar effect on Gmax, in both oocytes and mouse ltk-fibroblast cells, is shown in Figure 1 and 2. In Figure 2, the shift in latency suggests a shift in V0.5, suggesting the binding of PUFA to Site I.

      (3) The manuscript associates selectivity filter changes with S4 being up or down. It would help to clarify whether there was a change in [K+] in the two KCNQ1 structures used for modeling, as Mandala and MacKinnon (2023) state: "We note that one interesting difference between the two up structures regards the occupancy of K+ ions in the selectivity filter (SI Appendix, Fig. S5 C and D). In the polarized sample, due to the low extravesicular concentration of K+, density is only visible at the first and third positions in the selectivity filter, while density is present at all four positions in the unpolarized sample. Similar differences were observed in our previous study on Eag (20) and are qualitatively consistent with crystal structures of KcsA solved under symmetrical high and low K+ concentrations (45)." 

      Our studies states that there are some differences in the two structures with S4 in up-state and S4 in down-state and a reorganization of the pore. As for the change in [K+] occupancy in the two structures, we are not sure as our knowledge only come from what stated in Mandala and Mackinnon (2023). Mandala and MacKinnon did not discuss the selectivity filter in the down state structure in their paper and there are no K ions in any of their pdb files. So, we don’t know how many K+ ions there are in the down state.

      (4) The manuscript states " PUFAs increase Gmax by binding to a crevice in the pore domain" and "we elucidated that Lin-Glycine binds to a crevice between K326 and D301", this seems speculative without any actual binding studies or concrete structural evidence. A quantitative structural modeling analysis of whether changes in the crevice change the theoretical binding of Lin-Glycine might provide a stronger basis for speculation. 

      We toned down these statements in Results and Discussion to:

      “Crevice residues affect PUFA ability to increase Gmax"

      And

      Discussion: “We tested the hypothesis that the effect of Lin-Glycine involved conformational changes in the selectivity filter following PUFA binding to two residues K326 and D301 at the pore domain. Those residues delimit a small crevice that seems to change in size in different structures with S4 up or S4 down (Figure 3, D-F).”

      (5) The several figures detailing differences in selectivity filter conformation in the KCNQ1 structures are interesting and relevant in that they identify the movement of residues such as Y315 that, when mutated, ablate Lin-Glycine effect on Gmax. It would help to clarify whether T312 and I313 also move between the two selectivity filter conformations. 

      From the morph of the selectivity filter in the two conformations, it is noticeable that the changes and residue movements involve only residues at the upper part of the selectivity filter (including Y315 and D317). T312 and I313, are in the lower part of the selectivity filter and do not seem to move or rotate from their position between the two conformations of the selectivity filter.

      We now include a Supplementary Figures S3 and S4 that show the extent of movement of each residue in the pore region and a short description of this in the Results section.

      (6) The claim in the abstract that selectivity filter instability "explains the low open probability" seems too general. Lin-Glycine seems to increase the likelihood of conduction by 2.5-fold, but it was not clear whether open probability ceases to be low or whether other mechanisms also keep Po low. 

      We reword this sentence to “Our results suggest that the selectivity filter in KCNQ1 is normally unstable, contributing to the low open probability, and that the PUFA-induced increase in Gmax is caused by a stabilization of the selectivity filter in an open-conductive state..”

      Reviewer #2:

      (1) While all the electrophysiological recordings used KCNQ1/KCNE1 channels, all the structural models they used are KCNQ1 channels (without KCNE1). I know it is because the KCNQ1/KCNE1 complex structure is unavailable. However, according to their previous results, KCNQ1 alone is also upregulated by PUFAs. I am curious about what the single-channel recordings of KCNQ1 alone look like in the presence and absence of PUFAs. 

      We would love to include single-channel recordings of KCNQ1, but they are extremely hard to measure due to the small size and flickering nature of the channel.

      (2) As mentioned above, we do not have the KCNQ1/KCNE1 structure yet have the KCNQ1/KCNE3 structures (Sun and MacKinnon, Cell, 2020). According to the PDBs (6V00 or 6V01), the clevis (K326 and D301) looks covered by KCNE3. Is it true that PUFAs do not upregulate KCNQ1/KCNE3? If true, KCNE1 may not cover the clevis, so the binding mode should differ from the KCNQ1/KCNE3 structures. Please discuss the possible blocking of the clevis by KCNE proteins. 

      We previously presented data that is consistent with that KCNE1 rotates the VSD towards the PD (Wu et al., 2021). This mechanism would leave room for PUFA and KCNE1, so that PUFA can bind to the crevice (K326 and D301). So we think that this rotation will prevent PUFA and KCNE1 from competing for the same space. As for KCNQ1/KCNE3 we currently do not have any evidence about a possible upregulation by PUFA.

      (3) In the cryoEM structure with S4 resting (Figure 3F), the clevis looks too narrow for PUFA to bind. Is there any (either previous or current) evidence supporting that PUFA binding is state-dependent? 

      Because PUFAs integrate first into the bilayer and then diffuse towards its binding site on the channel, it would be hard to test a state-dependence of the binding. In addition, once PUFAs are in the bilayer, the rate of binding/unbinding is quite fast (within the ns range according to our previous MD simulations), whereas opening/closing rate is very slow (100 ms-s). So, the combination of slow wash in/washout, fast binding/unbinding, and slow opening/closing would make it very difficult to test the state-dependence of the binding by using a fast perfusion or different voltage protocols.  

      (4) In the previous report (Liin et al. Cell Reports, 2018), K326 is the most critical site for PUFA binding. Why the K326 mutants are not included in the current study? I also would like to see the single-channel recordings of the K326E mutant, which showed a smaller Gmax. Does the PUFA application reduce the probability of non-empty traces in this mutant? 

      As Liin et al. reported, mutations of K326 reduce the ability of PUFA to increase the Gmax. In this work, we wanted to gain further biophysical information on the mechanism that leads to an increase in Gmax, considering the knowledge we had from work conducted in our lab previously. We therefore focused here on residues downstream of K326 that we think are important for inducing the conformational changes at the selectivity filter. We agree that single channel experiments on K326E would be very interesting but that has to be for a future study.

      Minor points 

      (1) Liin et al. used S209F (Po of 0.4) and I204F (Po of 0.04) mutants. Their single-channel recordings would be a good addition. 

      We thank the reviewer for the suggestion. However, single channels analysis on S209F and I204F were previously shown (Eldstrom et al., 2010).

      (2) I would like to see how the Site I mutations (R2Q/Q3R) affect (or do not affect) the single-channel recordings (open probability and latency). 

      Thank you for the excellent suggestion. It would be interesting to assess the behavior of the channel when mutations occur at Site I. However, we think this information will not add any more detail to this study as we focus here our attention on the mechanism for Gmax increase. Single channels recordings are extremely hard to get, therefore we chose to include only mutations at Site II for this study.

      (3) I would like the G-V curves for all the mutations at 0 and 20 uM of Lin-Glycine (Figure 3C and Figures 5A and B). 

      We now added the G-V curves in Supplementary Figure S7.

      (4) I assume all the PUFAs have a similar effect on the selectivity filter, but a few other examples of PUFAs would be nice to see. 

      We anticipate that PUFAs and analogues with similar properties to Lin-Glycine would increasing the Gmax by a similar mechanism, because other PUFAs have been previously shown to increase the Gmax (Bohannon et al., 2020).

      (5) Although the probabilities of non-empty sweeps are written in the manuscript, bar graph presentations would be a nice addition to Figures 2 and 6. 

      We have added bar graphs of non-empty sweeps for Fig 2 and 6 in.

      (6) Is there no statistical significance for D317E and T309S in Figure 5A? 

      No statistical significance for D317E and T309S

      (7) There is no reference to Figure 7 in the manuscript. 

      A reference to Figure 7 has been added to the manuscript in the following paragraph.

      “Taken together, our results suggest that the binding of PUFA to Site II increases Gmax by promoting a series of interactions that stabilize the channel pore in the conductive state. For instance, we speculate that in the conductive state, hydrogen bonds between W304-D317 and W305-Y315, which are likely absent in the non-conductive conformation of KCNQ1, are created and that PUFA binding to Site II favors the transition towards the conductive state of the channel (Figure 7)”

      Reviewer #3:

      (1) Clarify the structural figures. Figures 3 D, E, and F - explain what the colors indicate. 

      A more detailed description of Figure 3 has been added to the legend.

      “D, E and F) Structure of crevice between S5 and S6 in KCNQ1 with S4 up (D and E) and S4 down (F). Residues that surround the crevice from S6 shown in blue (K326, T327, S330, V334) and from S5 in red (D301, A300, L303, F270). Remaining KCNQ1 residues shown in purple…, linoleic acid (LIN: gold color)”

      Fig 4. Only side chains of the residues are shown, making it hard to relate the figure to the familiar K channel selectivity filter. The main chain of the entire selectivity should be shown to orient readers to the familiar view of the K channel selectivity filter. In addition, the structures shown are only part of the selectivity filter, it should be specified which part of the selectivity filter is shown. These will also help the discussion at the bottom of page 10 and subsequent text. 

      We now provide a new Figure 4 with more details such as the main chain of the whole selectivity filter and surrounding peptide.

      (2) Cautions should be stated clearly when the structural comparison between the S4-up and S4-down is made that the structure of the pore when it is closed with S4-up may differ from the structure of the pore with S4-down. 

      We now state in addition “Clearly, there will be other differences in the pore domain between structures with activated and resting VSDs, for example the state of the activation gate.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Wu et al. explores the role of the histone reader protein SntB in Aspergillus flavus, claiming it to be a key regulator of development and aflatoxin biosynthesis. While the study incorporates various techniques, including gene deletion, ChIP-seq, and RNA-seq, several concerns and omissions in the paper raise questions about the validity and completeness of the presented findings.

      (1) Omissions of Prior Work:

      The authors fail to acknowledge and integrate prior research by Pfannenstiel et al. (2018) on the sntB gene in A. flavus, which covered phenotypic changes, RNA-seq data, and histone modifications. This omission raises concerns about the transparency and completeness of the current study.

      The absence of reference to studies by Karahoda et al. (2022, 2023) revealing SntB's involvement in the KERS complex in A. flavus and A. nidulans is a major oversight. This raises questions about the specificity of SntB's regulatory functions, as it may be part of a larger complex. The authors should clarify why these studies were omitted and how they ensure that SntB alone, and not the entire KERS complex, is responsible for the observed effects.

      We very appreciate reviewer’s professional question. As reviewer mentioned, Pfannenstiel et al. (2018) reported the functions of sntB gene covered secondary metabolism, development and global histone modifications in A. flavus and we also cited this paper (please see reference 20). In their study, the functions of sntB gene were analyzed by both Δ_sntB_ and overexpression sntB genetic mutants. SntB deletion impaired several developmental processes, such as sclerotia formation and heterokaryon compatibility, secondary metabolite synthesis, and the ability to colonize host seeds, which were consistent with our results (Figure 1 and 2). Unlike, a complementation strain was constructed in our study which further clarified and confirmed the function of sntB gene. What’s more, our main purpose is to find the downstream regulatory mechanism of SNTB, which was reported to be a transcription factor, not only as an important epigenetic reader. Please see lane 452-457 and lane 486-500.

      For the function of KERS complex in A. nidulans (Karahoda et al., 2022), we had cited the papers, please see reference 29. For the report about the function of KERS complex in A. flavus (Karahoda et al., 2023), this paper was published recently. We are sorry for the omissions of this work. In our revised manuscript, we have cited this paper and compared with our work. Please see lane 97-98 and reference 30. Based solely on our experiments, we cannot confirm whether it is acting alone or in conjunction with others, what we can confirm is that SntB plays a key role in the process. And we will conduct related research in the future.

      (2) Transparency and Accessibility of Data:

      The lack of accessibility and visualization tools for ChIP-seq and RNA-seq data poses a challenge for independent verification and in-depth analysis. The authors should address this issue by providing more accessible data or explaining the limitations of data availability. A critical component missing from the paper is a detailed presentation of ChIP-seq data, specifically demonstrating SntB binding patterns on key promoters. This omission weakens the link between SntB and the mentioned regulatory genes. The authors should include these crucial data visualizations to strengthen their claims.

      To review GEO accession GSE247683, you can go to https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE247683, and enter the token “ipilouscnruprsl” into the box. And after our paper being published, the data will be released. For the SntB binding patterns on key promoters, we have added in the Figure 4, please see Figure 4D, 4E, 5F, 5G, and table S9.

      (3) SntB Binding Sites and Consensus Sequence:

      The study mentions several genes upregulated in the sntB mutant without demonstrating SntB binding sites on their promoters. A detailed analysis of SntB binding maps is necessary to establish a direct link between SntB and these regulatory genes.

      Thanks for your suggestion. We have added the binding maps of SntB, please see Figure 5F, 5G; lane 362-364.

      (4) Mechanistic Insight into Peroxisome Biogenesis:

      If SntB indeed regulates peroxisome biogenesis, the absence of markers for peroxisomes and the localization of peroxisomes in the sntB mutant vs. WT strains is a significant gap. Providing evidence for peroxisome regulation is crucial for understanding the proposed mechanism and validating the study's claims.

      Thanks for your suggestion. Catalase is ubiquitously present in aerobic organisms and plays a crucial role in mitigating oxidative stress through the scavenging of reactive oxygen species (ROS). So, we detected the ROS level in sntB mutant and WT strain, as well as ∆catC strain (Figure 6H).

      In summary, while the manuscript presents intriguing findings regarding SntB's role in A. flavus, the omissions of prior work, lack of transparency in data accessibility, and insufficient mechanistic insights call for revisions and additional experimental evidence to strengthen the validity and impact of the study. Addressing these concerns will enhance the manuscript's contribution to the field.

      Thanks. We have revised our manuscript depending on the valuable comments provided above.

      Additionally, the way the English language is used could be improved.

      Thanks. We have asked a native English-writing assistant to proof read the paper and revised the grammar errors and typos and improve the readability and quality of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This work is of great significance in revealing the regulatory mechanisms of pathogenic fungi in toxin production, pathogenicity, and in its prevention and pollution control. Overall, this is generally an excellent manuscript.

      Strengths:

      The data in this manuscript is robust and the experiments conducted are appropriate.

      Weaknesses:

      (1) The authors found that SntB played key roles in the oxidative stress response of A. flavus by ChIP-seq and RNA sequencing. To confirm the role of SntB in oxidative stress, the authors have to better measure the ROS levels in the ΔsntB and WT strains, besides the ΔcatC strain.

      Thanks for your suggestion. We have supplemented the relevant experiments and the results were shown in Figure 6G and lane 185-192 and 395-398.

      (2) Why did the authors only study the function of catC among the 7 genes related to an oxidative response listed in Table S14?

      The function of some genes in Table S15 (Table S14 in old version of our manuscript) had been studied, such as cat1 [1]. In this study, we just choose catC for further validation, which was the most up-regulated gene in Δ_sntB_ strain. The others may also have important roles in SntB triggered antioxidant pathways to regulate development and aflatoxin biosynthesis in A. flavus. We will focus on this in the following work.

      (1) Zhu Z., Yang M., Bai Y., Ge F., Wang S. Antioxidant-related catalase CTA1 regulates development, aflatoxin biosynthesis, and virulence in pathogenic fungus Aspergillus flavus [J]. Environ Microbiol, 2020, 22(7): 2792-2810.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 52: Change "shad light" to "shed light"

      Thanks. We have revised. Please see lane 50.

      Line 62: Change "has" to "have" to match the plural noun "aflatoxins."

      Original: "Aflatoxins produced by A. flavus has strong toxicity..."

      Suggested: "Aflatoxins produced by A. flavus have strong toxicity..."

      Thanks. We have revised it. Please see lane 62.

      Line 79: Consider rephrasing for clarity.

      Original: "...which may result in the modulation of the expression of genes involved in toxin production [15-17]."

      Thanks. We have revised. Please see lane 77-80.

      Line 105: Add a comma after "host strain."

      Original: "A. flavus Δku70 ΔpyrG was used as a host strain for genetic manipulations."

      Suggested: "A. flavus Δku70 ΔpyrG was used as a host strain, for genetic manipulations."

      Thanks. We have revised it. Please see lane 107.

      Line 113, Table 1: Remove the extra "r" in "from" in the Source column.

      Original: "Kindly presented form Prof. Chang[1]"

      Suggested: "Kindly presented from Prof. Chang[1]"

      Thanks. We have revised it. Please see Table 1.

      Line 140: Typo - Change "reaches" to "reach."

      Original: "when silkworm larva reaches about 1 g in weight."

      Suggested: "when silkworm larvae reach about 1 g in weight."

      Thanks. We have revised it. Please see lane 141.

      Line 158: Typo - Change "pervious" to "previous."

      Original: "Data processing was according pervious study [39]."

      Suggested: "Data processing was according to a previous study [39]."

      Thanks. We have revised it. Please see lane 150.

      Line 138 The animal invasion assay using silkworms was conducted according to a previous study.

      Change "according" to "conducted according to" for clarity.

      Thanks. We have revised it. Please see lane 139.

      Line 148 Was carried out by APPLIED PROTEIN TECHNOLOGY, Shanghai (www. aptbiotech.com).

      Change "TECHNOLOY" corrected to "TECHNOLOGY."

      Thanks. We have revised it. Please see lane 149.

      Line 148 Data processing was conducted according to a previous study [39].

      Change "according to" to "conducted according to" for clarity.

      Thanks. We have revised it. Please see lane 139.

      Line 429 Schizzosaccharomyces pombe, Correct the spelling to "Schizosaccharomyces pombe [55]."

      Thanks. We have revised it. Please see lane 448.

      Reviewer #2 (Recommendations For The Authors):

      (1) The resolution of the words written in Figures 3 and 4 is not clear (or high) enough.

      Thanks. We have revised them. Please see Figures 3 and 4.

      (2) Which kind of protein marker (protein ladder) was used in Figure 4A, you should mark out the size of the related protein.

      Thanks. We have revised. Please see Figure 4A and lane 332-333.

      (3) Latin names do not necessarily need to be written in full when they are not the first time used in the text.

      Thanks. We have revised them throughout the manuscript.

      (4) The complementary strain of sntB was labeled as sntB-C in Figure 2B, while in other figures was Com-sntB. You should correct all related problems.

      Thanks. We have revised it. Please see Figure 2B.

      (5) What is the meaning of "1" in Table 1?

      Thanks. The meaning of "1" in Table 1 was a citation. We have revised. Please see Table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript constitutes an important contribution to antimalarial drug discovery, employing diverse systems biology methodologies; with a focus on an improved M1 metalloprotease inhibitor, the study provides convincing evidence of the utility of chemoproteomics in elucidating the preferential targeting of PfA-M1. Additionally, metabolomic analysis effectively documents specific alterations in the final steps of hemoglobin breakdown. These findings underscore the potential of the developed methodology, not only in understanding PfA-M1 targeting but also in its broader applicability to diverse malarial proteins or pathways. Revisions are needed to further enhance overall clarity and detail the scope of these implications.

      We thank the editor and reviewers for recognising the contribution our work makes to understanding the selective targeting of aminopeptidase inhibitors in malaria parasites and the wider impact this multi-omic strategy can have for anti-parasitic drug discovery efforts. The reviewers have provided constructive feedback and raised important points that we have taken on-board to improve our manuscript. In particular, we have revised aspects of the text and figures to enhance clarity, performed additional analysis on the other possible MIPS2673 interacting proteins and more comprehensively analysed the effect of MIPS2673 on parasite morphology. NB: Specific responses to comments in the public reviews are provided within responses to the specific recommendations to authors.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The article "Chemoproteomics validates selective targeting of Plasmodium M1 alanyl aminopeptidase as a cross-species strategy to treat malaria" presents a series of biochemical methods based on proteomics and metabolomics, as a means to:

      (1) validate the specific targeting of biologically active molecules (MIPS2673) towards a defined (unique) protein target within a parasite and (2) to explore whether by quantifying the perturbations generated at the level of the parasite metabolome, it is possible to extrapolate which metabolic pathway has been disrupted by using this biologically active molecule and whether this may further confirm selective targeting in parasites of the expected (or in-vitro targeted) enzyme (here PfA-1).

      The inhibitor used in this work by the authors (MIPS2673) is to my knowledge a novel one, although belonging to a chemical series previously explored by the authors, which recently enabled them to discover a specific PfA-M17 inhibitor, MIPS2571 (Edgard et al., 2022, ref 11 of this current work). Indeed, inhibitors specifically targeting either PfA-M1 or PfA-M17 (and not both, as currently done in the past) are scarce today, and highly needed to functionally characterize these two zinc-aminopeptidases. MIPS2673, blocks the development of erythrocytic stages of Plasmodium falciparum with an EC50 of 324 nM, blocks the parasite development at the young trophozoite stage at 5x EC50 (but at ring stages at 10xEC50, figure 1E), and inhibits the enzymatic activity of PfA-M1 (and its ortholog Pv-M1) but not of the related malarial metallo-aminopeptidases (M17 and M18 families) nor the human metalloenzymes from closely related enzymatic families, supporting its selective targeting of PfA-M1 (and Pv-M1).

      All experiments are carried out in vitro (e.g. biochemical studies such as enzymology, proteomics, metabolomics) and on cultured parasites (erythrocyte stages of Plasmodium falciparum and several gametocytes stages obtained in vitro); there are no in vivo manipulations. The work related to Plasmodium vivax, which justifies the "cross-species" indication in the title of the article, is restricted to using a recombinant form of the M1-family aminopeptidase in enzymatic assays. The rest of the work concerns only Plasmodium falciparum. While I found globally that this work is original and brings new data and above all proposes chemical validation approaches that could be used for other target validations under similar limiting conditions (impossibility of KO of the gene), I have some specific questions to address to the authors.

      Strengths and weaknesses:

      - The chemoproteomic approach, that explores the ability of MIPS2673 to more significantly "protect" the putative target (PfA-M1) against thermal degradation or enzymatic attack (by proteinase K), to document its selective targeting towards PfA-M1 (the inhibitor, once associated with its target, is expected to stabilize its structure or prevent the action of end proteases), uses several concentrations of MIPS2673 and provides convincing results. My main criticism is that these tests are carried out with parasite extracts enriched in 30-38 hours old forms, and restricted to the fraction of soluble proteins isolated from these parasitic forms, which still limits the scope of the analysis. It is clear that this methodological approach is a choice that can be argued both biologically (PfA-M1 is well expressed in these stages of the parasite development) and biochemically (it is difficult to do proteomic analyses on insoluble proteins) but I regret that the authors do not discuss these limitations further, notably, I would have expected (from Figure 1E) some targets to be also present at ring stages.

      - The metabolomic approach, by documenting the ability of MIPS2673 to selectively increase the number of non-hydrolyzed dipeptides in treated versus untreated parasites is another argument in favor of the selective targeting of PfA-M1 by MIPS2673, in particular by its broad-spectrum aminopeptidase action preferentially targeting peptides resulting from the degradation of hemoglobin by the parasite. The relative contribution of peptides derived from host hemoglobin versus other parasite proteins is, however, little discussed.

      The work as a whole remains highly interesting, both for the specific topic of PfA-M1's role in parasite biology and for the method, applicable to other malarial drug contexts.

      Reviewer #2 (Public Review):

      In this manuscript, the authors first developed a new small molecular inhibitor that could target specifically the M1 metalloproteases of both important malaria parasite species Plasmodium falciparum and P. vivax. This was done by a chemical modification of a previously developed molecule that targets PfM1 as well as PfM17 and possibly other Plasmodial metalloproteases. After the successful chemical synthesis, the authors showed that the derived inhibitor, named MIPS2673, has a strong antiparasitic activity with IC50 342 nM and it is highly specific for M1. With this in mind, the authors first carried out two large-scale proteomics to confirm the MIPS2673 interaction with PfM1 in the context of the total P. falciparum protein lysate. This was done first by using thermal shift profiling and subsequently limited proteolysis. While the first demonstrated overall interaction, the latter (limited proteolysis) could map more specifically the site of MIPS2673-PfM1 interaction, presumably the active site. Subsequent metabolomics analysis showed that MIPS2673 cytotoxic inhibitory effect leads to the accumulation of short peptides many of which originate from hemoglobin. Based on that the authors argue that the MIPS2673 mode of action (MOA) involves inhibition of hemoglobin digestion that in turn inhibits the parasite growth and development.

      Reviewer #3 (Public Review):

      This is a manuscript that attempts to validate Plasmodium M1 alanyl aminopeptidase as a target for antimalarial drug development. The authors provide evidence that MIPS2673 inhibits recombinant enzymes from both Pf and Pv and is selective over other proteases. There is in vitro antimalarial activity. Chemoproteomic experiments demonstrate selective targeting of the PfA-M1 protease.

      This is a continuation of previous work focused on designing inhibitors for aminopeptidases by a subset of these authors. Medicinal chemistry explorations resulted in the synthesis of MIPS2673 which has improved properties including potent inhibition of PfA-M1 and PvA-M1 with selectivity over a closed related peptidase. The compound also demonstrated selectivity over several human aminopeptidases and was not toxic to HEK293 cells at 40 uM. The activity against P. falciparum blood-stage parasites was about 300 nM.

      Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      Summary:

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of the Plasmodium M1 alanyl aminopeptidases, PfA-M1 and PvA-M1.

      Strengths:

      The main strengths include the synthesis of MIPS2673 which is selectively active against the enzymes and in whole-cell assay.

      Weaknesses:

      The weaknesses include the lack of additional analysis of additional targets identified in the chemoproteomic approaches.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Question 1. Line 737 (and elsewhere). Why are Plasmodium vivax orthologs of PfA-M1 and PfA-M17 called Pv-M1 and Pv-M17 and not PvA-M1 and PvA-M17, where A stands for Aminopeptidase? I would recommend changing the names if possible, although the mention of Pv-M1 and Pv-M17 is now current in the literature (which is kind of regrettable). See also Supplemental Table S1 where PfA-M1 is named Pf-M1.

      Supplemental Table S1 was updated to PfA-M1. Nomenclature for the Plasmodium vivax aminopeptidase orthologs was amended to PvA-M1 and PvA-M17 as suggested by the reviewer.

      Question 2. Figure 1. Observation of parasite culture slide smears in Figure 1E strongly suggests that an important target of MIPS2673 appears to be expressed at the ring stage or very young trophozoites, whereas the authors, in their proteomic and metabolomic analyses, performed studies focused on late trophozoites stages (30-38h post-invasion). This difference in the targeting of Plasmodium stages puzzles me and deserves some explanations from the authors, and is related to my question 3.

      As the reviewer indicates, ring-stage parasite growth appears to be affected at high concentrations (5x and 10x EC50) of MIPS2673. Under these conditions, parasite growth appears to stall during late rings/early trophs at ~16-22 h post invasion when haemoglobin digestion is increasing and when one presumes PfA-M1 (the primary target of MIPS2673) is increasing in both expression and activity (see references 26 and 28 of this manuscript). Thus, whilst it is unsurprising that MIPS2673 has some activity against ring-stage parasites, we focused on the trophozoite stage for our proteomics studies as we showed this to be the stage most susceptible to MIPS2673 (Fig. 1D) and reasoned that we would most likely identify the primary MIPS2673 target, and other interacting proteins, from a complex biological mixture at this stage. The same reasoning underpinned our decision to perform metabolomics on drug-treated trophozoites, as we reasoned we would see a greater functional effect on this stage. Furthermore, performing these experiments on trophozoites rather than rings minimises the interference from the host red blood cell. While we cannot rule out additional targets in rings, repeating all experiments during this parasite stage is beyond the scope of this study.

      Question 3. Figure 2. Although Figure 2 is insightful and somehow self-explanatory, I think it misses two specific pieces of information. First, it is indicated in line 618 (M&M) that parasite material for thermal stability and limited proteolysis studies correspond to synchronized parasites (30-38h post-invasion) but this information is not given in Figure 2. In addition, if I fully understand the experimental protocol of obtaining parasite extracts, they strictly correspond to the soluble protein fraction of the erythrocytic stages of plasmodium at the late trophozoite stage, and not to all parasitic proteins as the scheme of Figure 2 might suggest. I would appreciate it very much if these two points (parasite stages and soluble proteins) were clearly indicated in the scheme as indeed, not the whole parasite blood stage proteome is investigated in the study but just a part of it (~47%, as the authors indeed indicate line 406). Please, edit also the legend of the figure accordingly.

      This is correct, the soluble protein fraction from synchronised trophozoites was used in our proteomics studies. These details have been included in an updated Figure 2 and in the corresponding figure legend.

      Question 4. Thermal stabilization. Figure 3B. Could the authors explain how they calculated or measured "absolute" protein abundances, and how this refers to a number of parasites in initial assays as this is not clear to me. Notably, abundance for PfA-M1 is much higher than for PF3D7_0604300, which are interesting "absolute" values.

      Protein abundance was calculated using the mean peptide quantity of the stripped peptide sequence, with only precursors passing the Q-value threshold (0.01) considered for relative quantification. Within independent experiments, normalisation was based on total protein amount (determined by the BCA assay) rather than the initial number of parasites.

      PfA-M1 is known to be a highly abundant protein and PF3D7_0604300 (as well as the other protein hits identified by thermal stability proteomics) are likely less abundant. It is noted that abundance is also dependent on ionisation efficiency and trypsin digestion efficiency. Therefore, we avoid comparing absolute abundances across proteins and use relative differences across conditions instead.

      NB: the word “absolute” in the text (“absolute fold-change”) refers to the absolute value of the fold-change (i.e. positive or negative), and not to absolute quantification of proteins. The preceding text in each case clarifies that these are based on “relative peptide abundance”.

      Question 5. Figure 5A. How do the authors explain peptides whose abundances are decreasing instead of increasing? Figure 5C. Could the authors provide digital cues (aa numbers or positions) on the ribbon representation of the PfA-M1 sequence? It is difficult to correlate the position of the 3D domains with respect to the primary structure of the protein. Also, the "yellow" supposed to show the "drug ligand" is really not very visible.

      LiP-MS is based on the principle that ligand binding alters the local proteolytic susceptibility of a protein to a non-specific protease (in this case proteinase K, PK). In this sense, in LiP-MS we are not looking at variations in the stability of whole proteins (as is the case with thermal stability proteomics, where proteins detected with significantly higher abundance in treated relative to control samples reflects thermal stabilisation of the target due to ligand binding), but differences in peptide patterns between treated and control samples that reflect a change in the ability of PK to cleave the target. Thus, in the bound state, the ligand prevents proteolysis with PK. This results in decreased abundance of peptides with non-tryptic ends (as PK cannot access the region around where the ligand is bound) and increased abundance of the corresponding fully tryptic peptide, when compared to the free target. This concept is demonstrated in Fig. 4A and is explained in the text (lines 279-282) and Fig. 4 figure legend.

      To aid visualisation, we have not added amino acid positions on the PfA-M1 sequence in Fig. 5, but have provided amino acid positions for all peptides in Supplementary File 3. We have also changed the colour of the ligand in Fig. 5C to blue and increased transparency of the binding and centre of mass neighbourhoods.

      Question 6. Gametocyte assays. Line 824 states that several compounds were used as positive controls for anti-gametocyte activity (chloroquine, artesunate, pyronaridine, pyrimethamine, dihydroartemisinin, and methylene blue) and line 821 states that the biological effects are measured against puromycin. This is not very clear to me, could the authors comment on this?

      This wording has been clarified in the methods to reflect that 5 µM puromycin was used as the positive control to calculate percent viability, whereas the other antimalarials were run in parallel as reference compounds with known anti-gametocyte activity (line 862).

      Question 7. Metabolomics. Metabolomic assays were done on parasites at 28h pi, incubated for 1h with 3x EC50 of MIPS2673. You mention applying the drug on 2x10E8 infected red blood cells (line 838) but you do not explain how you isolate these infected red blood cells from non-infected red blood cells. Could you please specify this?

      Metabolomics studies were performed such that cultures at 2% haematocrit and 6% trophozoite-stage parasitaemia (representing 2 x 108 cells in total, rather than 2 x 108 infected cells) were treated with compound or vehicle and after 1 h metabolites were extracted. This methodological detail has been clarified in the methods (line 875).

      Question 8. Figure 3B. Does this diagram come from the experimental 3D structure created by the authors (8SLO) or from molecular modeling? Please specify in the legend (line 1305).

      The diagram showing the binding mode of MIPS2673 bound to PfA-M1 comes from the experimentally determined 3D structure (PDB ID: 8SLO). This has now been stated in the figure legend. Note that the structural diagram refers to Fig. 1B (not Fig. 3B as indicated by the reviewer). The experimentally determined PfA-M1 structure with MIPS2673 bound (PDB ID: 8SLO) was also used to map LiP peptides and estimate the MIPS2673 binding site in Fig. 5, which is also now reflected in the appropriate section of the text (line 308) and Fig. 5 legend.

      Question 9. Line 745. Why not indicate µm concentration for this H-Leu-NHMec substrate while it is indicated for the other substrates mentioned in the rest of the paragraph (H-Ala-NHMec, 20 μM, etc..). Also in this section (Enzyme assays) the pH at which the various enzymatic assays were done is missing.

      All enzyme assays were performed at pH 8.0. The concentration of H-Leu-NHMec varied depending on the enzyme assayed, as follows: 20 µM for PfA-M1, 40 µM for PvA-M1 and 100 µM for ERAP1 and ERAP2. This information is now clearly stated in the methods section (lines 782 and 787) and as a footnote for Supplemental Table S1.

      Question 10. Line 830, please define FBS.

      Fetal bovine serum (FBS) has been added where appropriate (line 867).

      Question 11. The authors mention in the title the targeting of several plasmodium species, but the only experimental study on the Plasmodium vivax species concerns the use of the recombinant enzyme Pv-M1. Authors also mention "multi-stage targets", but ultimately only look at erythrocyte stages and three different gametocyte stages.

      We have now removed the words “cross-species” and “multi-stage” from the manuscript title and abstract so as not to overstate these findings. We have also added the word “potential” in the manuscript text to clarify that selective M1 inhibition could offer a potential multistage and cross species strategy for malaria.

      Question 12. Supplemental Table S1. I would suggest replacing "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidases compared to selected human M1 homologues" with "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidase activities compared to selected human M1 homologues".

      Done.

      Question 13. Supplemental Table S3. Here you indicate IC50 while in text and Figure 1 you quote EC50. Why this difference?

      This has now been changed to EC50 in Supplemental Table S3.

      Reviewer #2 (Recommendations For The Authors):

      Amendments that I would recommend in order to improve the presentation include all four parts of the study:

      (1) In vitro antiparasitic activity of MIPS2673.

      The authors showed that MIPS2673 inhibits parasite growth with IC50 of 324nM measured by a standard drug sensitivity assay, Fig 1C. This is all well and good, but it would be helpful to include at least one if not more other compounds such as antimalaria drugs and/or their earlier inhibitors (e.g. inhibitor 1) for comparisons. This is typically done to show that the assay in this manuscript is fully compatible with previous studies. It will also give a better view of how the selective inhibition of PfM1 kills the parasite, specifically.

      Alongside MIPS2673, we also analysed the potency of the known antimalarial artesunate, which was found to have an EC50 of 4 nM. This value agrees with the expected potency of artesunate and indicates our MIPS2673 value of 324 nM is indeed compatible with previous studies. We have now reported the artesunate EC50 value for reference (lines 197-198 and Fig. S1).

      Next, the authors proceeded to investigate the stage-specific effect of MIPS2673 but this time doing a survival assay instead of proper IC50 estimations (Figure 1. I wonder why? Drug survival assays have typically very limited information content and measuring proper IC50 in stage-specific wash-off assays would be much more informative.

      We performed single concentration stage specificity assays to determine the parasite asexual stage at which MIPS2673 is most active. This involved washing off the compound after a 24 h exposure in rings or trophozoites and determining parasite viability in the next asexual lifecycle. While a full dose response curve would allow generation of an EC50 value against the respective parasite stages, this information is unlikely to change the interpretation that MIPS2673 is more active against trophozoites stages than against rings.

      Finally, in Figure 1E, the authors present the fact that the MIPS2673 arrests the parasite development. This is done by presenting a single (presumably representative) cell per time point. This is in my view highly insufficient. I recommend this figure be supplemented by parasite stage counts or other more comprehensive data representation. Also, the authors mention that while there is a growth arrest, hemoglobin is still being made. From the cell images, I can not see anything that supports this statement.

      We thank the reviewer for this constructive comment and they are correct in their assessment that these are representative parasite images at the respective time points. To address the reviewers concerns we have now provided cell counts from each treatment condition (Fig. 1E) at selected time points, which shows parasite stalling at the ring to trophozoite transition under drug treatment. On reflection, we agree that it is difficult to determine the presence of haemozoin from our images and have removed this statement.

      (2) Protein thermal shift profiling. In the next step, the authors proceed to carry out cellular thermal shift profiling to show that PfM1 indeed interacts with MIPS2673, this time in the context of the total protein lysates from P. falciparum. This section of the study is in my view quite solid and indeed it is nice to see that the inhibitor causes a thermal shift of PfM1 which further supports what was already expected: interaction.

      I have no problem with this study in terms of the technical outcome but I would urge the authors to tone down the interpretation of these results in two ways.

      Four other proteins were found to be shifted by the inhibitor which also indicates interactions. Calling it simply "off-target" interactions might not represent the truth. The authors should explore and in some way comment that interactions with these proteins could contribute to the MIPS2673 MOA. I do not suggest conducting any more studies but simply acknowledge this situation. Identifying more than one target is indeed very common in CETSA studies and it would be helpful to acknowledge this here as well.

      We agree that identifying binding proteins in addition to the “expected” target is commonplace, and is indeed one of the benefits of this unbiased and proteome-wide approach. In the results and discussion, we have now amended our language to refer to these additional hits as MIPS2673-interacting proteins. In our original manuscript we dedicate a paragraph in the discussion to these additional interacting proteins and the likelihood of them being targets that contribute to antimalarial activity. Of these four additional interacting proteins, only the putative AP2 domain transcription factor (PF3D7_1239200) is predicted to be essential for blood stage growth and is therefore the only protein from this additional four that would likely contribute to antimalarial activity. These points are explicitly stated in the discussion (lines 530-550). Notably, all of the other interacting proteins identified in our thermal stability dataset were detected in our LiP-MS experiment but were not identified as interacting proteins by this method. The remaining three proteins were two non-essential P. falciparum proteins with unknown functions (PF3D7_1026000 and PF3D7_0604300) that are poorly described in the literature and a human protein (RAB39A). Further analysis of these other thermal stability proteomics hits in our LiP-MS dataset (see responses to Reviewer #3) identified none or only 1 significant LiP peptide from these proteins across our LiP-MS datasets, indicating they are likely to be false positive hits. Caveats around identifying protein targets by different deconvolution methods are also now addressed (lines 545-550).

      At some point, the author argues that causing shifts of only four/five proteins including PfM1 shows that MIPS2673 does not interact with other (off) targets. Here one must be careful to present the lack of shifts in the CETSA as proof of no interaction. There are many reasons why thermal shifts are not observed including the physical properties of the individual proteins, detection limit etc. Again I suggest adjusting these statements accordingly.

      We thank the reviewer for raising this important point and have now included additional discussion around this comment (lines 545-550).

      Finally, I am not convinced that Figure 2 presents nothing more than the overall experimental scheme with not much new information. Many of such schemes were published previously in the original publication of thermal profiling. I would suggest omitting it from the main text and shifting it into supplementary methods etc.

      We agree that similar schemes have been published previously, especially for thermal proteome profiling, and acknowledge the reviewer’s suggestion of moving this figure to the supplemental material. However, we have kept Fig. 2 in the main text as this scheme also incorporates a LiP-MS workflow for malaria drug target deconvolution (the first to do so) and also to satisfy the additional details requested for this figure by Reviewer #1 (question 3).

      (3) Identification of MIPS2673 target proteins using LiP-MS. In the next step, the authors carried out the limited proteolysis analysis with the rationale that protein peptides that are near the inhibitor binding site will exhibit higher resilience to proteolysis. The authors did a very good job of showing this for PfM1-MISP2673 interaction. This part is very impressive from a technological perspective, and I congratulate the authors on such achievement. I imagine these types of studies require very precise optimizations and performance.

      Here, however, I struggle with the meaning of this experiment for the overall flow of the manuscript. It seems that the binding pocket of MIPS2673 is less known since the inhibitor was designed for it. In fact, the authors mentioned that the crystal structure of PfM1 is available. From this perspective, the LiP-MS study represents more of a technical proof of concept for future drug target analysis but has limited contribution to the already quite well-established PfM1-MISP2673 interaction. Perhaps this could be presented in this way in the text.

      We thank the reviewer for this comment and they are correct that we solved the crystal structure of PfA-M1 bound to MIPS2673. We wish to highlight that the primary reason for performing the LiP-MS study was as an independent and complementary target deconvolution method to narrow down the shortlist of targets identified with thermal stability proteomics, and validate with high confidence that PfA-M1 is indeed the primary target of MIPS2673 in parasites. The use of a complementary approach based on a different biophysical principle (proteolytic susceptibility vs thermal stability) would also allow us to identify MIPS2673 interacting proteins that may not be detectable by thermal stability proteomics, for example targets that do not alter their thermal stability upon ligand binding. The text in the results and discussion has been amended to clarify these points (lines 266-268 and 545-550).

      Furthermore, we agree that correctly predicting the MIPS2673 binding site on PfA-M1 using our LiP-MS peptide data is a technical proof of concept. Indeed, we wished to highlight the potential utility of LiP-MS for identifying both the protein targets of drugs and predicting their binding site, which is not possible with many other target deconvolution approaches. This point has been updated in the text (lines 303-304, 459-461).

      (4) Metabolomic profiling of MIPS2673 inhibition showed a massive accumulation of short peptides which clearly indicates that this inhibitor blocks some proteolytic activity of short peptides, presumably products of upstream proteolytic activities. Here the authors argue, that because many of these detected short (di-/tri-) peptides could be mapped on the hemoglobin protein sequence, this must be their origin. Although this might be the case the author could not exclude the fact that at least some of these come from other sources (e.g. Plasmodium proteins). It would be quite helpful to comment on such a possibility as well. In particular, it was mentioned that the main subcellular localization of PfM1 is in the cytoplasm while most if not all hemoglobin digestion occurs in the digestive vacuole...?

      Indeed, we agree that Pf_A-M1 is likely processing both Hb and non-Hb peptides and do not definitively conclude that all dysregulated peptides must be derived from haemoglobin. A subset of dysregulated peptides cannot be mapped to haemoglobin and must have an alternative source such as other host proteins or turnover of parasite proteins. We have amended the discussion to better reflect these possible alternate peptide sources (480-482). Although the peptides detected in the metabolomics study (2-5 amino acids) are too short to be definitively assigned to any specific parasite or RBC protein, it is important to note that our analysis strongly indicates that the majority, but not all, of dysregulated peptides are more likely to originate from haemoglobin than other human or parasite proteins. This is based on sequence mapping, which was aided by acquiring MS/MS data for a subset of dysregulated peptides from which we derive accurate sequences (as opposed to residue composition inferred from total peptide mass) to more directly link dysregulated peptides to haemoglobin. We further quantified the sequence similarity of dysregulated peptides to all detectable proteins in the _P. falciparum infected erythrocyte proteome (~4700 proteins), showing that these peptides are statistically more similar to haemoglobin than other host or parasite proteins.

      The apparent disconnect between PfA-M1 localisation (cytosol) and the predominant site of haemoglobin digestion (digestive vacuole, DV) is explained by the fact that peptides originating from digestion of haemoglobin in the DV are required to be transported into the cytoplasm for further cleavage by peptidases, including PfA-M1. This point has now been clarified in the discussion (lines 473-474).

      Reviewer #3 (Recommendations For The Authors):

      (1) Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      We thank the reviewer for this comment. The possibility of other targets contributing to MIPS2673 activity was also raised by Reviewer #2 (question 2) and is addressed above. Further to our response to Reviewer #2, we agree that the inability to generate resistant parasites in vitro could indicate that inhibition of multiple essential parasite proteins (including PfA-M1) contribute to MIPS2673 activity and do not rule out this possibility. It may also indicate the target has a very high barrier for resistance and is unable to tolerate resistance causing mutations as they are deleterious to function. Indeed, previous attempts to mutate PfA-M1 (references 12 and 50), and our own attempts to generate MIPS2673 resistant parasites in vitro (unpublished), were unsuccessful. It is important to note that of the hits reproducibly identified using thermal stability proteomics, only PfA-M1 and a putative AP2 domain transcription factor (PF3D7_1239200) are predicted to be essential for blood stage growth. We have explicitly stated that PF3D7_1239200 could also contribute to activity (line 533 and 537).

      As we identified multiple hits with thermal stability proteomics we employed the complementary LiP-MS method to further investigate the target landscape of MIPS2673. PfA-M1 was the only protein reproducibly identified as the target through this approach. Importantly, the five proteins identified as hits by thermal stability proteomics were also detected in our LiP-MS datasets, but only PfA-M1 was identified as a target by both target deconvolution methods, strongly indicating it is the primary target of MIPS2673 in parasites. An important caveat is that we profiled the soluble proteome (we did not include detergents necessary for extracting membrane proteins as they may interfere with these stability assays) and other factors (e.g. the biophysical properties of the protein) will impact on whether ligand induced stabilisation events are detected. We have added additional text in the discussion around the above points (lines 545-550).

      While we do not definitively rule out other MIPS2673 interacting proteins existing in parasites (that possibly also contribute to activity), our metabolomics studies indicated no functional impact by MIPS2673 outside of elevated levels of short peptides. This is indicative of aminopeptidase inhibition and the profile of peptide accumulation was distinct from a known PfA-M17 inhibitor, and other antimalarials, further pointing to selective inhibition of the PfA-M1 enzyme by MIPS2673 being responsible for antimalarial activity.

      (2) The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      As PfA-M1 was the only protein reproducibly identified as an interacting protein across both LiP-MS experiments (and by thermal stability proteomics) we focused our analysis on this protein. However, we agree that further analysis of the other putative interacting proteins would be valuable. Additional analysis was performed  (see new figure S4) on the other interacting proteins identified by thermal stability proteomics and the other interacting proteins identified in LiP-MS experiment one, as no other proteins (apart from PfA-M1) were identified as hits in the second LiP-MS experiment (lines 314-318, 495-505, 740-762 and Fig. S4). Using the common peptides detected across both LiP-MS experiments we mapped significant LiP peptides to the structures of the other putative MIPS2673-interacting proteins, where a structure was available and significant LiP-MS peptides were detected, and measured the minimum distance to expected binding sites. It is noted that when using the same criteria for a significant LiP peptide that we used for our PfA-M1 analysis, only one significant LiP peptide is identified from these other putative interacting proteins (YSPSFMSFK from PfADA). Therefore, we used a less stringent criteria for defining significant LiP peptides for these other proteins (see methods and Fig. S4 legend) in order to identify significant LiP peptides to map to structures. This analysis showed that, with the exception of PfA-M17, significant LiP-MS peptides for these other proteins are not significantly closer to binding sites than all other detected peptides, supporting our assertion that these other proteins are likely to be false positives or not functionally relevant MIPS2673 interacting proteins. Although significant peptides from PfA-M17 were closer to the binding site, our thermal stability and metabolomics data, combined with our previous work on the PfA-M17 enzyme, argue against this being a functionally relevant target (see lines 362-374 and 486-529 for a more detailed discussion). Another possible explanation for this result is that peptide substrates accumulating due to primary inhibition of PfA-M1 interact with PfA-M17, leading to structural changes around the enzyme active site that are detected by LiP-MS.

      (3) The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      As indicated, the accumulation of short peptides identified by metabolomics suggests MIPS2673 perturbs aminopeptidase function. Many of these peptides (but not all) likely map to haemoglobin and are more haemoglobin-like than other proteins in the infected red blood cell proteome. An effect on a subset of non-haemoglobin peptides is also apparent and we have added this to our discussion (also refer to our response to question 4 from Reviewer #2). A direct comparison to our previous metabolomics analysis of a specific PfA-M17 inhibitor (MIPS2571, reference 11) revealed MIPS2673 induces a unique metabolomic profile. The extent of peptide accumulation differed and a subset of short basic peptides (containing Lys or Arg) were elevated only by MIPS2673, consistent with the broad substrate preference of PfA-M1. Importantly, the metabolomics profile induced by MIPS2673 is the opposite of many other antimalarials, which cause depletion of haemoglobin peptides. Taken together, the profile of short peptide accumulation induced by MIPS2673 is consistent with specific inhibition of PfA-M1.

      (4) Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      Contrary to this comment we solved the crystal structure of PfA-M1 bound to MIPS2673, determining its binding mechanism to the enzyme. This was further supported through proteomics-based structural analysis by LiP-MS. Undertaking site specific mutagenesis would be interesting to further probe the binding dynamics of MIPS2673 to the M1 protein. However, we believe it is beyond the scope of this study and would not change our conclusion that MIPS2673 binds to PfA-M1, which we have shown using multiple unbiased proteomics-based methods, enzyme assays and X-ray crystallography.

      (5) In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      As addressed in our previous responses, across independent thermal stability proteomics experiments we consistently identified 5 interacting proteins, including the expected target PfA-M1. In contrast, only PfA-M1 was reproducible across independent LiP-MS experiments. While several plausible putative targets (including aminopeptidases and metalloproteins) were identified in one of our LiP-MS experiment, they appear to be false discoveries and not responsible for the antiparasitic activity of MIPS2673, as peptide-level stabilisation was not consistent across independent LiP-MS experiments, and an interaction is refuted by our thermal stability, metabolomics and recombinant enzyme inhibition data. We have now performed further analysis of these other putative interacting proteins, which also argues against them being likely interacting proteins (see also response to question 2). We have also added to our existing discussion on possible MIPS2673 targets and the likelihood of these proteins contributing to antimalarial activity (lines 486-550).

      (6) The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Our metabolomics studies were performed using typical experimental conditions for investigating the antimalarial mechanisms of compounds by metabolomics (see references 11, 39, 40 and 55-57). We used a short 1 h incubation at 3x EC50 allowing us to profile the primary parasite pathways affected by MIPS2673 and avoid a nonspecific death phenotype associated with longer incubations. As addressed in our response to Reviewer #1 (question 2) we focused on trophozoite infected red blood cells as this is the stage most susceptible to MIPS2673 and when one presumes the greatest functional impact would be seen. It is possible that an expanded kinetic metabolomics analysis may reveal secondary mechanisms involved in MIPS2673 activity and we have now acknowledged this in the manuscript (lines 515-516). However, even though secondary mechanisms may become apparent at longer incubations it also becomes difficult to uncouple drug specific responses from nonspecific death effects. We believe any additional information provided by an expanded metabolomics analysis is unlikely to outweigh the significant extra financial cost associated with this type of experiment.

      It is correct that many antimalarial compounds appear to disrupt haemoglobin digestion when analysed by metabolomics. However, as indicated in our manuscript (lines 369-373) and previous responses, the profile of elevated haemoglobin peptides induced by MIPS2673 is substantially different to the profile caused by other antimalarials. For example, artemisinins and mefloquine cause haemoglobin peptide depletion (references 55-57) and chloroquine results in increased levels of a different subset of non-haemoglobin peptides (see Creek et al. 2016). While there is some overlap in profile with a selective M17 inhibitor (our previous work, reference 11), the level of enrichment of these peptides is different and MIPS2673 also induces accumulation of a distinct set of basic peptides consistent with the substrate preference of the PfA-M1 enzyme. As we show that MIPS2673 does not inhibit other parasite aminopeptidases, a likely explanation for the profile overlap is that the build-up of substrates that cannot be processed by PfA-M1 leads to secondary dysregulation of other aminopeptidases. Our analyses (sequence mapping, MS/MS analysis and sequence similarities to all infected red blood cell proteins) strongly indicate that the majority of elevated peptides (but not all) originate from haemoglobin. Combined with our proteomics and recombinant enzyme data indicating direct engagement of PfA-M1, and with previous literature indicating the enzyme functions to cleave amino acids from haemoglobin-derived peptides, our data indicates MIPS2673 likely directly perturbs the haemoglobin digestion pathway through PfA-M1 inhibition.

      (7) Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      We do not propose MIPS2673 as an antimalarial candidate. The experiments presented here were centred on target validation rather than identification of an antimalarial lead, which may be the focus of future studies. To avoid this confusion, we have amended the manuscript title and language throughout to clarify this point.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Khan et. al., investigated the functional redundancy of the non-canonical L-cysteine synthases of M. tuberculosis, CysM and CysK2, focussing on their role in mitigating the effects of host-derived stress. They found that while deletion mutants of the two synthases (Rv∆cysM, Rv∆cysK2) have similar transcriptomes under standard conditions, their transcriptional response to oxidative stress is distinct. The impact of deleting the synthases also differentially affected the pools of L-cysteinederived metabolites. They show that the mutants (Rv∆cysM, Rv∆cysK2) have impaired survival in peritoneal macrophages and in a mouse model of infection. Importantly, they show that the survival of the mutants increases when the host is defective in producing reactive oxygen and nitrogen species, linking the phenotype to a defect in combating host-derived stress. Finally, they show that compounds inhibiting L-cysteine synthases reduce the intracellular survival of M.

      tuberculosis.

      Strengths:

      (1) The distinct transcriptome of the Rv∆cysM and Rv∆cysK2 mutants in the presence of oxidative stress provides solid evidence that these mutants are distinct in their response to oxidative stress, and suggests that they are not functionally redundant.

      (2) The use of macrophages from phox-/- and INF-/- mice and an iNOS inhibitor for the intracellular survival assays provides solid evidence that the survival defect seen for the Rv∆cysM and Rv∆cysK2 mutants is related to their reduced ability to combat host-derive oxidative and nitrosative stress. This is further supported by the infection studies in phox-/- and INF-/- mice.

      Weaknesses:

      (1) There are several previous studies looking at the transcriptional response of M. tuberculosis to host-derived stress, however, the authors do not discuss initial RNA-seq data in the context of these studies. Furthermore, while several of the genes in sulfur assimilation and L-cysteine biosynthetic pathway genes are upregulated by more than one stress condition, the data does not support the statement that it is the "most commonly upregulated pathway in Mtb exposed to multiple host-like stresses".

      We have made changes in the manuscript in line with reviewer’s suggestion.  

      “Thus RNA-Seq data suggest that genes involved in sulfur assimilation and L-cysteine biosynthetic pathway are upregulated during various host-like stresses in Mtb (Figure S2). Given the importance of sulphur metabolism genes in in vivo survival of Mtb [1, 2], it is not surprising that these genes are dynamically regulated by diverse environment cues. Microarray studies have shown upregulation of genes encoding sulphate transporter upon exposure to hydrogen peroxide and nutrient starvation [3-7] Similarly, ATP sulfurlyase and APS kinase is induced during macrophage infection and by nutrient depletion. Induction of these genes that coordinate first few steps of sulphur assimilation pathway indicate that probable increase in biosynthesis of sulphate containing metabolites that may be crucial against host inflicted stresses. Furthermore, genes involved in synthesis of reduced sulphur moieties (cysH, sirA and cysM) are also induced by hydrogen peroxide and nutrient starvation. Sulfur metabolism has been postulated to be important in transition to latency. This hypothesis is based on transcriptional upregulation of cysD, cysNC, cysK2, and cysM upon exposure to hypoxia. Multiple transcriptional profiling studies have reported upregulation of moeZ, mec, cysO and cysM genes when cells were subjected to oxidative and hypoxic stress [1, 6-11] further suggesting an increase in the biosynthesis of reduced metabolites such as cysteine and methionine and sulfur containing cell wall glycolipids upon exposure to oxidative stress [12]. We have modified the sentence to “significantly upregulated pathway in Mtb exposed to multiple host-like stresses”

      (2) For the quantification of the metabolites, it isn't clear how the abundance was calculated (e.g., were standards for each metabolite used? How was abundance normalised between samples?), and this information should be included to strengthen the data.

      Thanks for picking up this. We have extended our description of metabolomics methods. It now reads: “Due to the tendency of M. tuberculosis to form clamps, which significantly skews any cell number estimation we normalized samples to protein/peptide concentration using the BCA assay kit (Thermo). Therefore, our LC-MS data is expressed as ion counts/mg protein or ratios of that for the same metabolite. This is a standard way to express ion abundance data as it was done previously [13, 14].

      Furthermore, labelling with L-methionine was performed to determine the rate of synthesis of the L-cysteine-derived metabolites. L-cysteine is produced from L-methionine via the transsulfuration pathway, which is independent of CysM and CysK2. It is therefore difficult to interpret this experiment, as the impact of deleting CysM and CysK2 on the transsulfuration pathway is likely indirect.

      The reviewer may have misunderstood the experiment and the results presented. Labelling was not performed with L-methionine. We use 34S derived from SO42-, to monitor reductive assimilation of sulfur and its transit from S2- until L-methionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration, but instead direct synthesis of L-methionine via cysteine, and consequently we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling, to make sure other readers will not think we are measuring transulfuration.

      (3) The ability of L-cysteine to rescue the survival defect of the Rv∆cysM and Rv∆cysK2 mutants in macrophages is interpreted as exogenous L-cysteine being able to compensate for reduced intracellular levels. However, there is no evidence that L-cysteine is being taken up by the mutants and an alternate explanation is that L-cysteine functions as an antioxidant within cells i.e., it reduces intracellular ROS.

      The concentration of L-cysteine used for peritoneal macrophage survival rescue experiments was titrated to have no minimum survival advantage in case of wild-type Rv. Thus, at the given concentration, we believe that the contribution of cysteine in reducing intracellular ROS within cells does not have a major role since there is no significant difference in the survival of wild-type Rv strain. Had cysteine reduced intracellular ROS, we would expect increased bacterial survival of Rv due to diminished oxidative stress. 

      Furthermore, L-cysteine addition also mitigates CHP induced survival defect in vitro [15] and nullifies observed effect of Cysteine inhibitors in vitro [16] suggesting that cysteine or cystine can be transported into Mtb. This has also been previously shown in case of AosR mutant strain [15], CysH [2] and over 70% uptake of exogenously added [35S] cysteine to a growing culture of Mtb [17].

      The authors sought to investigate the functional redundancy of the non-canonical L-cysteine synthases CysM and CysK2. While their distinct transcriptional response to oxidative stress suggests distinct physiological roles, the study did not explore these differences and therefore provides only preliminary insight into the underlying reasons for this observation. In the context of drug development, this work suggests that while L-cysteine synthase inhibitors do not have high potency for killing intracellular M. tuberculosis, they have the potential to decrease the pathogen's survival in the presence of host-derive stress.

      Reviewer #2 (Public Review):

      Summary:

      The paper examines the role L-cysteine metabolism plays in the biology of Mycobacterium tuberculosis. The authors have preliminary data showing that Mycobacterium tuberculosis has two unique pathways to synthesize cysteine. The data showing new compounds that act synergistically with INH is very interesting.

      Strengths:

      RNAseq data is interesting and important.

      Weaknesses:

      The paper would be strengthened if the authors were to add further detail to their genetic manipulations.

      The authors provide evidence that they have successfully made a cysK2 mutant by recombineering. This data looks promising, but I do not see evidence for the cysM deletion. It is also important to state what sort of complementation was done (multicopy plasmid, integration proficient vector, or repair of the deletion). Since these mutants are the basis for most of the additional studies, these details are essential. It is important to include complementation in mouse studies as unexpected loss of PDIM could have occurred.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section.  

      Reviewer #3 (Public Review):

      In this work, the authors conduct transcriptional profiling experiments with Mtb under various different stress conditions (oxidative, nitrosative, low pH, starvation, and SDS). The Mtb transcriptional responses to these stress conditions are not particularly new, having been reported extensively in the literature over the past ~20 years in various forms. A common theme from the current work is that L-cysteine synthesis genes are seemingly up-regulated by many stresses. Thus, the authors focused on deleting two of the three L-cysteine synthesis genes (cysM and cysK2) in Mtb to better understand the roles of these genes in Mtb physiology.

      The cysM and cysK2 mutants display fitness defects in various media (Sautons media, starvation, oxidative and nitrosative stress) noted by CFU reductions. Transcriptional profiling studies with the cysM and cysK2 mutants revealed that divergent gene signatures are generated in each of these strains under oxidative stress, suggesting that cysM and cysK2 have non-redundant roles in Mtb's oxidative stress response which likely reflects the different substrates used by these enzymes, CysO-L-cysteine and O-phospho-L-serine, respectively. Note that these studies lack genetic complementation and are thus not rigorously controlled for the engineered deletion mutations.

      The authors quantify the levels of sulfur-containing metabolites (methionine, ergothioneine, mycothiol, mycothionine) produced by the mutants following exposure to oxidative stress. Both the cysM or cysK2 mutants produce more methionine, ergothioneine, and mycothionine relative to WT under oxidative stress. Both mutants produce less mycothiol relative to WT under the same condition. These studies lack genetic complementation and thus, do not rigorously control for the engineered mutations.

      Next, the mutants were evaluated in infection models to reveal fitness defects associated with oxidative and nitrosative stress in the cysM or cysK2 mutants. In LPS/IFNg activated peritoneal macrophages, the cysM or cysK2 mutants display marked fitness defects which can be rescued with exogenous cysteine added to the cell culture media. Peritoneal macrophages lacking the NADPH oxidase (Phox) or IFNg fail to produce fitness phenotypes in the cysM or cysK2 mutants suggesting that oxidative stress is responsible for the phenotypes. Similarly, chemical inhibition of iNOS partly abrogated the fitness defect of the cysM or cysK2 mutants. Similar studies were conducted in mice lacking IFNg and Phox establishing that cysM or cysK2 mutants have fitness defects in vivo that are dependent on oxidative and nitrosative stress.

      Lastly, the authors use small molecule compounds to inhibit cysteine synthases. It is demonstrated that the compounds display inhibition of Mtb growth in 7H9 ADC media. No evidence is provided to demonstrate that these compounds are specifically inhibiting the cysteine synthases via "ontarget inhibition" in the whole Mtb cells. Additionally, it is wrongly stated in the discussion that "combinations of L-cys synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial load inside the host". This statement suggests that the INH + cysteine synthase inhibitor combinations reduce Mtb loads within a host in an infection assay. No data is presented to support this statement.

      We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in the whole Mtb cells. However, the inhibitors used in this study have been previously profiled in vitro (https://www.sciencedirect.com/science/article/abs/pii/S0960894X17308405?via%3Dihub).  We have modified the sentence to “a combination of L-cysteine synthase inhibitors with front-line TB drugs like INH, significantly reduced the bacterial survival in vitro”

      References

      (1) Hatzios, S.K. and C.R. Bertozzi, The regulation of sulfur metabolism in Mycobacterium tuberculosis. PLoS Pathog, 2011. 7(7): p. e1002036.

      (2) Senaratne, R.H., et al., 5'-Adenosinephosphosulphate reductase (CysH) protects Mycobacterium tuberculosis against free radicals during chronic infection phase in mice. Mol Microbiol, 2006. 59(6): p. 1744-53.

      (3) Betts, J.C., et al., Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol, 2002. 43(3): p. 717-31.

      (4) Hampshire, T., et al., Stationary phase gene expression of Mycobacterium tuberculosis following a progressive nutrient depletion: a model for persistent organisms? Tuberculosis (Edinb), 2004. 84(3-4): p. 228-38.

      (5) Schnappinger, D., et al., Transcriptional Adaptation of Mycobacterium tuberculosis within Macrophages: Insights into the Phagosomal Environment. J Exp Med, 2003. 198(5): p. 693-704.

      (6) Voskuil, M.I., et al., The response of mycobacterium tuberculosis to reactive oxygen and nitrogen species. Front Microbiol, 2011. 2: p. 105.

      (7) Voskuil, M.I., K.C. Visconti, and G.K. Schoolnik, Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb), 2004. 84(3-4): p. 218-27.

      (8) Brunner, K., et al., Profiling of in vitro activities of urea-based inhibitors against cysteine synthases from Mycobacterium tuberculosis. Bioorg Med Chem Lett, 2017. 27(19): p. 4582-4587.

      (9) Manganelli, R., et al., Role of the extracytoplasmic-function sigma factor sigma(H) in Mycobacterium tuberculosis global gene expression. Mol Microbiol, 2002. 45(2): p. 365-74.

      (10) Burns, K.E., et al., Reconstitution of a new cysteine biosynthetic pathway in Mycobacterium tuberculosis. J Am Chem Soc, 2005. 127(33): p. 11602-3.

      (11) Manganelli, R., et al., The Mycobacterium tuberculosis ECF sigma factor sigmaE: role in global gene expression and survival in macrophages. Mol Microbiol, 2001. 41(2): p. 423-37.

      (12) Tyagi, P., et al., Mycobacterium tuberculosis has diminished capacity to counteract redox stress induced by elevated levels of endogenous superoxide. Free Radic Biol Med, 2015. 84: p. 344-354.

      (13) de Carvalho, L.P., et al., Metabolomics of Mycobacterium tuberculosis reveals compartmentalized co-catabolism of carbon substrates. Chem Biol, 2010. 17(10): p. 1122-31.

      (14) Agapova, A., et al., Flexible nitrogen utilisation by the metabolic generalist pathogen Mycobacterium tuberculosis. Elife, 2019. 8.

      (15) Khan, M.Z., et al., Redox homeostasis in Mycobacterium tuberculosis is modulated by a novel actinomycete-specific transcription factor. EMBO J, 2021. 40(14): p. e106111.

      (16) Brunner, K., et al., Inhibitors of the Cysteine Synthase CysM with Antibacterial Potency against Dormant Mycobacterium tuberculosis. J Med Chem, 2016. 59(14): p. 6848-59.

      (17) Wheeler, P.R., et al., Functional demonstration of reverse transsulfuration in the Mycobacterium tuberculosis complex reveals that methionine is the preferred sulfur source for pathogenic Mycobacteria. J Biol Chem, 2005. 280(9): p. 8069-78.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Figure S1 it would be useful to include the reverse transsulfuration pathway given that it contributes to the L-cysteine pool, and that L-methionine was used for metabolite labelling experiments.

      We are in agreement with the reviewer’s suggestion, and we have included reverse transsulfuration in Fig S1. Please note that Labelling was not performed with L-methionine. We used 34S derived from SO42-to monitor the reductive assimilation of sulfur and its transit from S2- until Lmethionine, passing through cysteine. We specified in material and methods that we have used sodium sulfate-34S (Merck 718882), as our label source of sulfur. This method was first employed in M. tuberculosis by the Bertozzi group to identify sulfolipids in mycobacteria. Therefore, we are not measuring transsulfuration but instead a direct synthesis of Lmethionine via cysteine, and consequently, we are indeed assessing the importance of cysK2 and cysM in this process. We have now added to the results section (page 9) that we employed (Na34SO4) for labeling to make sure other readers will not think we are measuring transulfuration.

      Author response image 1.

      (2) In Figure S2 it is unclear why the control is included in this figure given that the stress conditions were compared to the control. What is the control being compared to here?

      The heat maps of controls have been included to demonstrate relative gene expression in independent/each of the replicates. The normalized count for the differentially expressed genes are plotted. To better understand the RNA-seq results, we plotted the fold change of differentially expressed genes due to different stress conditions (New figure & table- Figure S3 & Table S2). This allowed us to understand the expression profile of genes in all the stress conditions simultaneously, regardless of whether they were identified as differentially expressed. The data revealed that specific clusters of genes are up- and downregulated in oxidative, SDS, and starvation conditions. In comparison, the differences observed in the pH 5.5 and nitrosative conditions were limited (Figure S3 & Table S2).  

      (3) In Figure S3 it would be more informative to show fold-enrichment than gene counts in (b) to (f).

      In our opinion, gene counts are more informative when plotting GO enrichments, as the number of genes in each GO category can vary drastically. The significance values are already calculated based on the fold enrichment of a category compared to the background, and hence, p-adj values plotted on the x-axis can be sort of a proxy for fold enrichment. Hence, instead of plotting two related variables, plotting the total gene counts that belonged to a category is usually helpful for the reader in understanding the “scale” in which a category is affected.

      (4) Figure 1c standard Sautons is a defined media, and is not nutrient-limiting - the authors should clarify the composition of the media that they used here.

      The composition of Sautons media used in the study is 0.5g/L MgSO4.7H20, 2 g/L citric acid, 1g/L L-asparagine, 0.3 g/L KCl.H20, 0.2% glycerol, 0.64 g/L FeCl3, 100 μM NH4Cl and 0.7 g/L K2HPO4.3H20. We have modified the sentence in line with reviewer’s suggestion.  

      (5) The authors claim that the distinct transcriptomes for the two mutants indicate that "CysM and CysK2 distinctly modulate 324 and 1104 genes". The effect is likely due to distinct downstream consequences of the deletions, rather than direct regulation by the synthases. This section should be reworded for clarity.

      We have modified the sentence in line with reviewer’s suggestion.

      (6) In Figure 3 it would be useful to express mycothione levels as a percentage of the total mycothiol pool to give an indication of the extent to which the thiol is being oxidised.

      While we appreciate reviewer’s suggestion, we cannot make ratios of IC for two different compounds, as they ionize different. 100 ion counts of one does NOT equal to 100 ion counts of the other.

      (7) Figure 6 is difficult to interpret as the concentrations used in the INH + inhibitor wells are not clear. It would be useful to indicate the concentrations of each compound added next to the wells in the figure.

      We have modified the figure and legends in line with reviewer’s suggestion

      Reviewer #2 (Recommendations For The Authors):

      (1) Document the cysM deletion.

      The details of CysM knockout generation have been previously published ([15]; Appendix Figure S4), and complementation strain details are provided in the methods section. 

      (2) The oxidative stress CHP is not defined in the figure legend.

      We have modified the legend in line with the reviewer’s suggestion.

      (3) Can we see the structures of the compounds?

      Kindly refer to Fig 6a for the structures of compounds 

      (4) Fix the genetics and the paper is very interesting.

      I might be missing something. The authors do provide promising complementation data for several of the stresses. Provide evidence for the cysM deletion and complementation and the data will be very compelling. The focus of the paper is important for our understanding of the biology of Mycobacterium tuberculosis.

      Thank you for appreciating our study. The details of CysM knockout and complementation strain generation have been previously published ([15]; Appendix Figure S4 & Methods)). CysK2 mutant and complementation strain details are included in the present manuscript (Figure 1b & Methods).

      Reviewer #3 (Recommendations For The Authors):

      The transcriptional profiling studies do not rigorously control for the engineered mutations using genetic complementation.

      The complementation strains used in all in vitro, ex vivo and in vivo experiments showcase that the phenotypes associated with knockouts are gene specific. We choose not to include complementation strains in RNA sequencing experiments due to the large number of samples handling and associated costs.  

      Figure 3. These data are not rigorously controlled without genetic complementation, explain why some data in Figure 3 was generated at 24 hr and other data was generated at 48 hr, remove subbars in 3g. Please provide more clarification on Fig 3e-g because the normalization in these panels makes it appear as if there is little- or no-difference in the levels of 34S incorporation into the thiol metabolites.

      The complementation strains used in all in vitro, ex vivo, and in vivo experiments showcase that the phenotypes associated with knockouts are gene-specific. We chose not to include complementation strains in Figure 3 experiments due to the large number of sample handling and associated costs. 

      The time points in the given experiment were chosen based on an initial pilot experiment. It is apparent that a longer duration is required to see the phenotypes associated with labelling compared to pool size. The differences observed are statistically significant. 

      Surfactant and SDS stress are used interchangeably in the text, legends, and figures. Please be consistent here.

      We have modified the text in line with reviewer’s suggestion.

      Consider re-wording the 1st paragraph on page 5 to better clarify how Trp, Lys, and His interact with the host immune cells.

      We have modified the text in line with reviewer’s suggestion.

      Cite the literature associated with the sulfur import system in Mtb on page 3 in the 2nd paragraph.

      We have modified the text in line with reviewer’s suggestion.

      The manuscript nicely describes the construction of a cysK2 mutant. It is unclear how the cysM mutant was generated. Please clarify, cite, or add the cysM mutant construction to this manuscript.

      The details of CysM knockout and complementation strain generation has been previously published ([15]; Appendix Figure S4 & Methods)). We have included the citation in the methods section of current manuscript.

      Provide evidence that the small molecules used in Fig 6 are on target and inhibit the cysteine biosynthetic enzymes in whole bacteria. It is unclear how a MIC can be determined with these compounds in 7H9 ADC when deletion mutants grow just fine in this media. Is this because the compounds inhibit multiple cysteine synthesis enzymes and/or enzymatic targets in other pathways? To me, the data suggests that the compounds are hitting multiple enzymes in whole Mtb cells. Does cysteine supplementation reverse the inhibitory profiles with the compounds in Figure 6?

      As mentioned in the text, all the compounds were ineffective in killing Mtb, likely because Lcysteine synthases are not essential during regular growth conditions. Hence, the MIC for cysteine inhibitors was very high - C1 (0.6 mg/ml), C2 (0.6 mg/ml), and C3 (0.15 mg/ml) opposed to the standard drug, isoniazid with MIC of 0.06 ug/ml. We agree with the reviewer that the experiments do not conclusively prove that these compounds specifically inhibit the cysteine synthases via "on-target inhibition" in  Mtb cells. The inhibitors used in this study have been previously profiled in vitro [8]. However, one cannot rule out the hypothesis that these compounds might also have some off-target effects.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but some open questions remain about the interpretation of activity/binding assays and the newly incorporated HDX-MS results. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent binding of substrate and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including both numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript has been improved in a revision by performing additional experiments to help corroborate certain aspects of the study. But these new experiments do not address all of the open questions about the structural basis for mechanism. Additionally, some questions about the strength of biochemical data and fit of binding or kinetic curves to data that were raised by other referees still remain. Some experimental observations are not consistent with the proposed model. For example, why does dATP enhance Gly radical formation when the proposed mechanism of dATP inhibition involves disorder in the Gly radical domain?

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We express our gratitude to the reviewer for dedicating time to review our work and for the overall favorable assessment. We agree that the question of exactly how much the glycyl radical domain becomes more mobile without losing the glycyl radical entirely is an unresolved one but we also think that our work sets a solid basis for future experiments by us and others.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination, complemented by hydrogen-deuterium exchange (HDX). One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering (or increased protein dynamics) of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR, in a class III system, through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      The addition of hydrogen-deuterium exchange mass spectrometry (HDX-MS) complements the results originating from cryo-EM data. Most notably, is the observation of the enhanced exchange (albeit quite subtle) of the GRD domain in the presence of dATP that matches the loss of structural information in this region in the cryo-EM data. The most pronounced and compelling HDX results are seen in the form of dATP-induced protection of peptides immediately adjacent to the b-hairpin at the s-site, where dATP is expected to bind based on cryo-EM. It is clear that the presence of dATP increases the rigidity of this region.

      We are happy that both reviewers find the HDX-MS experiments to be a valuable addition to the existing data.

      Weaknesses:

      The discussion of the change in peptide mobility in the N-terminal region is complicated by the presence of bimodal mass spectral features and this may prevent detailed interpretation of the data, especially for select peptide region that shows opposite trends upon nucleotide association.

      Further, the HDX data in the NxN flap is unchanged upon nucleotide binding (ATP, dATP, or CTP), despite changes observed in the cryo-EM data.

      We are grateful to the reviewer for the comprehensive feedback on the HDX-MS part and for identifying areas for improvement. The HDX analysis was of course undertaken with the intention of identifying differences in disorder of the NxN flap and GRD region. From an HDX perspective both regions were found to be highly susceptible to HDX regardless of state/ligand, due to surface accessibility and/or very fast dynamics. However, this does not mean that there is no difference in the degree of order of these regions upon ligand addition, simply that we with HDX-MS, in the limited time span of 30-3000 seconds, could not conclusively support an increased disorder. We have rephrased the discussion text to reflect this fact

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 5 (and throughout the manuscript) there are some inconsistencies in how dissociation constants for effectors and inhibitors are described - for example, D in KD is sometimes subscripted and sometimes not.

      Thank you for noticing these remaining errors. We hope that we have fixed all of them now.

      Reviewer #3 (Recommendations For The Authors):

      The authors addressed many of the initial concerns raised. The addition of the HDX-MS data in this revision is a welcomed contribution to the work and complements the cryo-EM data. In select cases, the data may be over-interpreted. This reviewer suggests that the authors revise the text in this section so that it is more consistent with the presented data.

      Specific points:

      (1) The bimodal mass spectral features in the N-terminal domain complicate the data interpretation. Specifically for peptides in 81-99 region, the fast exchanging feature shows protection in the presence of (d)ATP/CTP, but the opposite trend is observed for the slow exchanging species. It is therefore advisable to not make absolutes about the HDX results in this region, as the data are complicated.

      As stated by the reviewer, it is not possible from the presented HDX data to deduce if this is a result of 50% loaded dimer or the oligomerization state of the protein. We have remedied this by removing mentions of a difference between the dATP and ATP in bimodality. Also, we have addressed this in the text by stating that the main reason is most likely the different oligomerization states present in solution. Nevertheless, it is clear from the HDX data that the N-terminal region and 81-99 are very interesting, and it was somewhat disappointing that due to the dynamics of the oligomerization it was not possible to SEC-purify pure dimer or tetramer samples for HDX-MS, in order to deconvolute the cause.

      (2) Related to #1, the authors assign the bimodal HDX behavior to EX1 mechanism, but this is not necessarily (and unlikely) true based on the limited time points. The authors also state that it originates from the heterogeneity of the sample: "a mixture of states" which could reflect the mixture of oligomerization states. The authors should be careful assigning EX1 mechanism unless there are compelling results to support it.

      We apologize for the unfortunate phrasing. It was not our intention to imply that the bimodality is due to true EX1 kinetics. See the above answer. The mention of EX1 has been removed from the discussion text.

      (3) The deuterium uptake for peptide 118-126 is very small (~1Da) compared to the length of the peptide. The change in deuterium uptake (<0.25Da) from dATP is very small; the authors should proceed with caution when presenting interpretations of such small differences.

      We agree with the reviewer that extra caution should be taken when dealing with such a small difference. However, the 118-126 peptide has been significance tested in both HDExaminer and Deuteros 2.0, and we also observed this for more than one run. The difference in uptake is small but increases to significance at the longer labelling times. The proximity to the NxN flap makes it interesting in context of an allosteric conformational change. i.e the dynamics of the NxN might be too fast so we can only see some secondary effects. We would like to keep the data  in Figure 10 for reasons of transparency. In essence this is similar to the observed bimodality mentioned above: we cannot fully explain the observation but present the data as it was observed.

      (4) On p. 22, the authors should consider revising the following statement: "confirming dATP binding to the s-site." Even though the HDX data are most compelling for the protection of peptides 178-204 and 330-348 that are adjacent to the beta-hairpin at the s-site, these data cannot "confirm" a binding site for a small molecule, such as dATP.

      We appreciate that the reviewer has pointed out that the statement can be misleading, and we agree that the binding site of small molecules can’t be confirmed based solely on HDX data. The sentence reformulated to clarify that the binding site was confirmed based on the combined evidence of HDX data and the previously presented biochemical and structural data on the s-site.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Valk and Engert et al. examined the potential relations between three different mental training modules, hippocampal structure and functional connectivity, and cortisol levels (stress) over a 9-month period. They found that among the three types of mental training: Presence (attention and introspective awareness), Affect (socio-emotional - compassion and prosocial motivation), and Perspective (socio-cognitive - metacognition and perspective taking) modules; Affect training most robustly related to changes in hippocampal structure and function - specifically, CA1-3 subfields of the hippocampus. Moreover, change in intrinsic functional connectivity related to changes in diurnal cortisol release and long-term cortisol exposure. These changes are proposed to result from a combination of factors, which is supported by multivariate analyses showing changes across subfields and training content relate to cortisol changes.

      The authors demonstrate that mindfulness training programs are a potential avenue for stress interventions that impact hippocampal structure and cortisol, providing a promising approach to improve health. The data contribute to the literature on plasticity of hippocampal subfields during adulthood, the impact of mental training interventions on the brain, and the link between CA1-3 and both short- and long-term stress changes.

      The authors thoughtfully approached the study of hippocampal subfields, utilizing a method designed for T1w images that outperformed Freesurfer 5.3 and that produced comparable results to an earlier version of ASHS. The authors note the limitations of their approaches and provide detailed information on the data used and analyses conducted. The results provide a strong basis from which future studies can expand using computational approaches or more fine-grained investigations of the impact of mindfulness training on cortisol levels and the hippocampus.

      We thank the Reviewer for the positive re-evaluation and summary of our findings and work. We made additional change as suggested and hope this clarified any open points.

      I have a few additional suggestions. Clarifying the language around the multivariate results and the impact across subfields and training modules would be helpful. 

      We are happy to provide further clarifications with respect to the multivariate results and the impact of training on subfields.

      The multivariate analyses served as a final step to explore any potential connections between training modules and hippocampal subfields, beyond just the link between CA1-3 and the Affect Module. These additional analyses were suggested by the Reviewers, and we, as authors, agreed that taking a broader view of how different parts of the hippocampus interact with overall changes can provide valuable insights into the relationship between mental training, cortisol fluctuations, and changes in CA1-3 subfields.

      We employed a multivariate partial least squares method, which aims to identify the directions in the predictor space that account for the most variance in changes observed, by creating latent variables. Initially, we investigated whether there was a general connection between CA1-3 subfields and cortisol changes, regardless of which training module produced these effects. Our findings confirmed a consistent relationship across all three training modules, indicating a strong association between cortisol changes, particularly markers such as AUC and slope change, and alterations in CA1-3 structure and functional connectivity. We explored a model incorporating changes across all hippocampal subfields and stress markers across different modules. In the right hemisphere, changes in the volume of the CA1-3 subfield were more strongly associated with stress markers, compared to other subfields. However, this association was less pronounced in the left hemisphere.

      Our multivariate approach captured fluctuations across subfields and modules beyond group-level associations, leading to a more nuanced interpretation. While the univariate analysis of module-specific changes in volume and associations within the Affect Module may offer a straightforward interpretation, as they coincide with increases in CA1-3 volume, the multivariate analysis also accounts for individual-level changes not observed at the group level using a data-driven approach. Overall these findings are in line with the group-level observations, yet provide nuance on specificity.

      We clarified these considerations further in the manuscript;

      Abstract:

      “Notably, using a multivariate approach, we found that other subfields that did not show group-level changes also contributed to changes in cortisol levels.”

      Results:

      “We employed a multivariate partial least squares method, which aims to identify the directions in the predictor space that account for the most variance in changes observed, by creating latent variables. Initially, we investigated whether there was a general connection between CA1-3 subfields and cortisol changes, regardless of which training module produced these effects.”

      Discussion:

      “Finally, through conducting multivariate analysis, we once more noticed associations between changes in CA1-3 volume and functional adaptability and alterations in stress levels, particularly prominent within the Affect Module. Integrating all subfields into a unified model highlighted a distinct significance of CA1-3, although for the left hemisphere, we observed a more diverse range of contributions across subfields. In summary, we establish a connection between a socio-emotional behavioral intervention, shifts in hippocampal subfield structure and function, and decreases in cortisol levels among healthy adults.

      Although the univariate examination of changes specific to modules in volume and connections within the Affect Module presents how changes in cortisol align with group-level rises in CA1-3 volume, the multivariate analysis extended this observation through considering individual-level alterations not discernible at the group level through a data-driven method. These results generally corresponded with observations at the group level but offer additional insights into specificity, and hint at system-level alterations.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This useful study tests the hypothesis that Mycobacterium tuberculosis infection increases glycolysis in monocytes, which alters their capacity to migrate to lymph nodes as monocyte-derived dendritic cells. The authors conclude that infected monocytes are metabolically pre-conditioned to differentiate, with reduced expression of Hif1a and a glycolytically exhaustive phenotype, resulting in low migratory and immunologic potential. However, the evidence is incomplete as the use of live and dead mycobacteria still limits the ability to draw firm conclusions. The study will be of interest to microbiologists and infectious disease scientists.

      In response to the general eLife assessment, we would like to emphasize that the study did not deal with “infected monocytes” per se but rather with monocytes purified from patients with active TB. We show that monocytes purified from these TB patients (versus healthy controls) differentiate into DCs with different migratory capacities. In addition, to address the reviewer's comments in this new version of our manuscript, we include a relevant characterization of the migration capacity of DCs infected with Mtb to the plethora of assays already shown with viable bacteria in the previous revised version of our manuscript. 

      All in all, we believe that our study has significantly improved thanks to the feedback provided by the editor and reviewer panel during the different revision processes. We sincerely hope that this version of our manuscript is deemed fit for publication in this prestigious journal.

      Public Reviews:

      Reviewer #3 (Public Review):

      In the revised manuscript by Maio et al, the authors examined the bioenergetic mechanisms involved in the delayed migration of DC's during Mtb infection. The authors performed a series of in vitro infection experiments including bioenergetic experiments using the Agilent Seahorse XF, and glucose uptake and lactate production experiments. Also, data from SCENITH is included in the revised manuscript as well as some clinical data. This is a well written manuscript and addresses an important question in the TB field. A remaining weakness is the use of dead (irradiated) Mtb in several of the new experiments and claims where iMtb data were used to support live Mtb data. Another notable weakness lies in the author's insistence on asserting that lactate is the ultimate product of glycolysis, rather than acknowledging a large body of historical data in support of pyruvate's role in the process. This raises a perplexing issue highlighted by the authors: if Mtb indeed upregulates glycolysis, one would expect that inhibiting glycolysis would effectively control TB. However, the reality contradicts this expectation. Lastly, the examination of the bioenergetics of cells isolated from TB patients undergoing drug therapy, rather than studying them at their baseline state is a weakness.

      We thank the reviewer for this insightful assessment and feedback of our study. With regards to the data obtained with iMtb to support that with live Mtb, we have clarified the use of either iMtb or Mtb for each figure legend in the new version of the manuscript. Furthermore, we included the confirmation of the involvement of TLR2 ligation in the up-regulation of HIF-1α triggered by viable Mtb (new Fig S2E). We also conducted migration assays using (live) Mtb-infected dendritic cells (DCs) treated with either oxamate or PX-478 to validate that the HIF1a/glycolysis axis is indeed essential for DC migration (new Fig 5D).

      We respectfully acknowledge the reviewer's statement regarding the potential relationship between glycolysis and the control of TB. However, we find it necessary to elaborate on our stance, as our data offer a nuanced perspective. Our research indicates that DCs exhibit upregulated glycolysis following stimulation or infection by Mtb. This metabolic shift is crucial for facilitating cell migration to the draining lymph nodes, an essential step in mounting an effective immune response. Yet, it remains uncertain whether this glycolytic induction reaches a threshold conducive to generating a protective immune response, a matter that our findings do not definitively address. This aspect is carefully discussed in the manuscript, lines 380-385.

      Moreover, analyses of samples from chronic TB patients suggest that the outcome of inhibiting glycolysis may vary depending on factors such as the infection stage, the targeted cell type (e.g., monocytes, DCs), and the affected compartment (systemic versus local). This variability aligns with the concept of "too much, too little" exemplified by the dual roles of IFNγ (PMID: 28646367) and TNFα (PMID: 19275693) in TB, emphasizing the need to maintain an inflammatory equilibrium. In the context of the HIF1α/glycolysis axis, it appears to be a matter of timing: a case of "too early" activation of glycolysis in precursors, which could upset the delicate balance necessary for an effective immune response. We have added these comments in the discussion (pages 19-20, lines 468-485).

      In summary, while acknowledging the reviewer's perspective, we believe that a comprehensive understanding of the interplay between Mtb infection and glycolysis in myeloid cells requires further consideration of various contextual conditions, urging caution against oversimplified interpretations.

      With regard to the patients' information, as pointed out by the reviewer, according to the inclusion criteria for patient samples in the approved protocol by the Institutional Ethics Committee, we recruit patients who have received less than 15 days of treatment (for sensitive TB, the total treatment duration is at least 6 months). We do not have access to patient sample before they begin the treatment, as starting therapy is the most urgent matter in this case. Following the reviewer's suggestion, we investigated whether the glycolytic activity of monocytes correlated with the initiation of antibiotic treatment within this 15-day period. Our observations did not show any significant impact during the initial 15 days of treatment (see expanded reply below). However, after 2 months of treatment, we found that the glycolytic profile of CD16+ monocytes returned to baseline levels as per our analysis. This suggests that despite the normalization of glycolytic activity with antibiotic therapy, heightened basal glycolysis remains noticeable during the initial two weeks of treatment (time limit to meet the inclusion criteria in our study cohort).

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) In the revised manuscript, the authors addressed concerns related to using irradiated Mtb, a positive development. However, the study predominantly employs 1:1 or 2:1 MOI, representing a low infection model, with no observed statistical distinction between the two MOIs (Fig-1). To enhance the study, inclusion of a higher MOI (e.g., 5:1 or 10:1) would have been more informative. This becomes crucial as prior research on human macrophages indicates that Mtb infection typically hampers glycolysis, a finding inconsistent with the present study.

      As the reviewer notes, important work has documented the inhibition of glycolysis in M. tuberculosis-infected macrophages dependent on the MOI (PMID 30444490). For instance, in this study, hMDMs infected at an MOI of 1 showed increased extracellular acidification and glycolytic parameters, as opposed to macrophages infected at higher MOI, or the same MOI but measured in THP1 cells. In light of these findings, we attempted to extend our study with Mo-DCs to higher MOIs, but too much cell death was induced, limiting our ability to obtain reliable metabolic measurements and functional assays from these cultures. Consistent with this, other authors reported that more than 40% of Mo-DC die after 24 hours following infection with H37Rv at an MOI of 10 (PMID 22024399, Fig 2B). We acknowledge that more comprehensive focused in vivo studies would be needed to assess the overall impact of infection. We foresee that in the context of natural infection, DC with different levels of infection will coexist, some with low bacillary load that may be able to trigger glycolysis and migrate, others highly infected and more likely to die. In this case, we are unable to provide a full explanation for the delay in the onset of the adaptive response, an aspect that requires further investigation. From our perspective, the important contribution of our work is more focused on understanding the later stage of infection, when chronic infection is established, where precursors already seem to have a limited capacity to generate DC with a good migratory performance regardless of being confronted with a low bacillary load. 

      To better clarify the scope and limitations of the work, we added these comments to the discussion (see discussion, lines 405-408).

      The study emphasizes that Mtb infection enhances glycolysis in Mo-DCs (Fig-1 and Fig-2). Despite the authors advocating lactate as the end product (citing three reviews/opinions), the historical literature supported by detailed experimentation convincingly favors pyruvate. While the authors' attempt to support an alternate glycolytic paradigm is understandable, it is simply not necessary. This is further supported by the authors' claim that oxamate is an inhibitor of glycolysis (abstract and main text). Oxamate is a pyruvate analogue that directly inhibits the conversion of pyruvate into lactate by lactate dehydrogenase. Simply put, if oxamate was an inhibitor of glycolysis then the cells would have died.

      (2) Taking into account the reviewer's suggestions, we changed the text accordingly, referring to oxamate as an LDH inhibitor, including in the abstract.

      In Fig-2, clarify the term "bystander DCs." Explain why these MtbRFP- DCs exhibit distinct behavior compared to uninfected DCs, especially considering their similarity to Mtb-infected ones.

      (3) To clarify these results, as correctly suggested by the reviewer, we incorporated a sentence in the results section, stating that bystander DCs are cells that are not in direct association with Mtb (Mtb-RFP-DCs), but are rather nearby and exposed to the same environment (page 7, line 145-148). In other words, bystander cells are those exposed to the same secretome and soluble factors as infected cells. Our data indicate that bystander DCs upregulate their state of glycolysis just like infected DCs do, which suggests the presence of soluble mediators induced during infection that are capable of triggering glycolysis even in uninfected cells.

      These results are in line with the observation that bacteria lacking infectious capacity (such as the irradiated Mtb) also trigger glycolysis in DCs (Fig 1), likely via TLR2 receptors that are potentially activated by the release of mycobacterial antigens or bacterial debris present in the microenvironment (Fig 3). We incorporated this interpretation in the discussion of the manuscript (lines 403-408).

      (4) Notably, the authors conducted SCENITH on both iMtb and viable Mtb (Fig-2). However, OCR, PER, and Mito- & Glyco- ATP were solely measured in MO-DCs stimulated by iMtb. Given the distinct glycolytic responses between iMtb and viable Mtb, it is crucial to assess these parameters in Mo-DCs treated with viable Mtb. Moreover, it is unclear as to how the relative ATP in Fig-2F was calculated as both Mito-ATP and Glyco-ATP is significantly high in iMtb-treated Mo-DCs (Fig-2E). Also, figure 2 contains panels with no labeling, which is confusing.

      We appreciate the reviewer's suggestion that additional determinations would enrich the bioenergetic profile of DCs during infection. However, due to biosafety considerations and economic-driven limitations, we are currently unable to measure OCR, PER, and Mito- & Glyco- ATP, as these assessments require live cell cultures within BSL3 containment, if live Mtb is to be employed. Regrettably, our BSL3 facility is not equipped with a Seahorse instrument—few facilities in the world have such type of BLS3-driven investment. For this key reason, we employed SCENITH for our BSL3-based experiments.

      Concerning the how ATP was calculated, we show below the raw data for Mito-ATP and Glyco-ATP results and calculations of their relative contributions.

      Author response table 1.

      (5) In Figures 3, 4, & 5, the consistent use of only iMtb was observed. Previous concerns about this approach were raised in the review, with the authors asserting that the use of viable Mtb was beyond the manuscript's scope. However, this claim is inaccurate. Both the authors' findings and literature elsewhere emphasize notable differences not only in host-cell metabolism but also in immune responses when treated with viable Mtb compared to dead or iMtb. Therefore, it is recommended to incorporate viable Mtb in experiments where only iMtb was utilized. Also, in the abstract (3rd sentence), do the authors refer to live or irradiated Mtb? It is imperative to clearly indicate this distinction, as the subsequent conclusions are based only on one of these two scenarios, not both. The contradictory mitochondrial mass results (figure 1; live and dead Mtb showed opposite mitochondrial mass results) clearly illustrate the profound difference live (versus dead) Mtb cells can have on an experiment.

      We thank the reviewer for stating this concern. For Figure 3, the involvement of TLR2 ligation on lactate release was also confirmed with live Mtb (shown in Figure S2D). In this current version, we also confirmed the involvement of TLR2 ligation in the up-regulation of HIF-1α triggered by live Mtb (new Fig S2E). As for Figure 4, we agree that performing assays with live Mtb will add complementary information. Indeed, we hope to investigate in the future the impact of the glycolysis/HIF1a axes on the adaptive immune response. We believe that employing live bacteria and considering their active immune evasion strategies will be crucial. However, at present, this is not the focus of the current manuscript and is beyond its scope.

      We also agree with the reviewer that confirmation of the migratory behavior of DCs following Mtb infection is a crucial aspect of the study. To comply with this pertinent request, we performed new migration assays using Mtb-infected DCs treated with oxamate or PX-478 to validate that the HIF1a/glycolysis axis; results convincingly demonstrate that this axis is essential for DC migration, particularly in the context of Mtb-infected cells (new Fig 5D). Having observed the same inhibitory effect of HIF1a and LDH inhibition on cell migration in either Mtb-infected or iMtb-stimulated DCs, we consider that the sentence alluded to by the reviewer in the abstract is now applicable to both contexts (page 2, line 34-36). We hope this reviewer agrees.

      (6) The discussion and the graphical abstract elucidating the distinctions in glycolysis between CD16+ monocytes of HS and TB patients and iMtb-treated Mo-DCs are currently confusing and require clarification. According to the abstract, monocytes from TB patients exhibit heightened glycolysis, resulting in diminished HIF-a activity and migratory capacity of MO-DCs. This prompts a question: if exacerbated glycolysis in monocytes is associated with adverse outcomes, wouldn't it be logical to consider suppressing glycolysis? If so, how can inhibiting glycolysis, a favored metabolic pathway for pro-inflammatory responses, be beneficial for TB therapy?

      We understand the reviewer’s concern about this apparent paradox. As previously mentioned in response to the public review provided by the reviewer, inhibiting glycolysis may yield varying outcomes depending on the stage of infection, as well as the cellular target (e.g., monocytes, DCs) or compartment (systemic versus local). It is imperative to delve deeper into the potential role of the HIF1α/glycolysis axis at the systemic level within the context of chronic inflammation, contrasting with its role in a local setting during the acute phase of infection.

      A comprehensive understanding of the interplay between Mtb infection and glycolysis in myeloid cells requires further consideration of various contextual conditions, urging caution against oversimplified interpretations. For instance, one of the objectives of host-directed therapies (HDTs) is to mitigate host-response inflammatory toxicity, which can impede treatment efficacy (doi: 10.3389/fimmu.2021.645485). In this regard, traditional anti-inflammatory drugs such as non-steroidal anti-inflammatory drugs (NSAIDs) and corticosteroids have been explored as adjunct therapies due to their immunomodulatory properties. Additionally, compounds like vitamin D, phenylbutyrate (PBA), metformin, and thalidomide, among others, have been investigated in the context of TB infections (doi:10.3389/fimmu.2017.00772), highlighting the diverse range of strategies aimed at enhancing TB treatment. These efforts extend beyond bolstering antimicrobial activity to encompass minimizing inflammation and mitigating tissue damage.

      (7) I am not convinced that BubbleMap made any significant contribution to the manuscript perhaps because it is poorly described in the figure legends/main text (I am unable to determine what data set is significant or not).

      We agree with the reviewer’s comment. To clarify the valuable information gleaned from these analyses, we have added interpretive guidelines on bubble color, bubble size and statistical significance in the legend of Figure 7. We hope these changes may reflect the significant contribution of the BubbleMap analysis approach to this study, which demonstrates a significant enrichment of interferon response gene expression in the monocyte compartment from patients with active TB compared to their control counterparts. Notably, this enrichment does not extend to genes associated with the OXPHOS hallmark.

      (8) The use of cells/monocytes from TB patients is a concern in addition to the incomplete demographic table. In the case of the latter, absolute numbers including percentages should be included. Importantly, it appears that cells from TB patients were used, that received anti-TB drug therapy (regimen not stated) up to two weeks post diagnosis and not at baseline. This is important as recent studies have shown that anti-TB drugs modulates the bioenergetics of host cells. Lastly, what were the precise TB symptoms the authors referred to in figure 7C?

      We have updated the demographic table and included the absolute numbers. We concur with the reviewer's viewpoint, particularly in light of recent findings illustrating the impact of anti-TB drug treatment on cell metabolism (doi: 10.1128/AAC.00932-21/). Again, this study underscores the complexity of such effects, which exhibit considerable variability influenced by factors such as cell type, drug concentration, and combination therapy.

      Despite this variability, our analysis involving monocytes from TB patients, who received different antibiotic combinations within short time frames (less than 15 days) reveals a marked increase in glycolysis in CD16+ monocytes compared to healthy counterparts. We did not observe a correlation between monocyte glycolytic capacity and the start time of antibiotic treatment within this 15-day window (see below, Author response image 1). These findings suggest that the antibiotic regimen does not have a significant impact on monocyte glycolytic capacity during the first 15 days.  However, we did observe an effect of antibiotic treatment when comparing patients before and 2 months after treatment. Enrichment analysis of various monocyte subsets before and after 2 months of treatment (GEO accession number: GSE185372) showed that CD14dim CD16+ and CD14+ CD16+ populations had higher glycolytic activity before treatment, which is decreased then post-treatment (Author response image 2).

      Author response image 1.

      Correlation analysis between the baseline glycolytic capacity and the time since treatment onset for each monocyte subset (CD14+CD16-, CD14+CD16+ and CD14dimCD16+, N = 11). Linear regression lines are shown. Spearman’s rank test. The data are represented as scatter plots with each circle representing a single individual.

      Author response image 2.

      Gene enrichment analysis for glycolytic genes on the pairwise comparisons of each monocyte subset (CD14+CD16-, CD14+CD16+ and CD14dimCD16+) from patients with active TB pre-treatment vs patients with active TB (TB) undergoing treatment for 2 months. Comparisons with a p-value of less than 0.05 and an FDR value of less than 0.25 are considered significantly different.

      Overall, our results indicate that while drug treatment does affect cell bioenergetics, this effect is not prominent within the first 15 days of treatment. CD16+ monocytes maintain high basal glycolytic activity that normalizes after treatment, contrasting with the CD16- population (even under the same circulating antibiotic doses). This highlights the intricate interplay between anti-TB drugs and cellular metabolism, underscoring the need for further research to understand the underlying mechanisms and therapeutic implications.

      Finally, the term symptoms evolution refers to the time period during which a patient experiences cough and phlegm for more than 2-3 weeks, with or without sputum that may (or not) be bloody, accompanied by symptoms of constitutional illness (e.g, loss of appetite, weight loss, night sweats, general malaise). As requested, this definition has been included in the method section (page 28-29, lines 705-709).

      Minor:

      (1) Incorporate the abbreviation for tuberculosis "(TB)" in the first line of the abstract and similarly introduce the abbreviation for Mycobacterium tuberculosis when it is first mentioned in the abstract.

      Thank you, we have amended it accordingly.

      (2) As the majority of experiments are in vitro, the authors should specify the number of times each experiment was conducted for every figure.

      We have included this information in each figure legend (see N for each panel). Since the majority of our approaches are conducted in vitro using primary cell cultures (specifically, human monocyte-derived DCs), we utilized samples from four to ten independent donors, not replicates, in order to account for the variability seen between donors.

      (3) Rename Fig-2. Ensure consistent labeling for the metabolic dependency of uninfected, Mtb-infected, and the Bystander panel, aligning with the format used in panels A & B. Similarly, replace '-' with 'uninfected'.

      We have modified the figure following most of the reviewer’s suggestions. However, we decided to keep the nomenclature “-” to denote a control condition, which can be unstimulated (panels A-B, fig 2) or uninfected cells (panels C-D, fig 2) depending on the experimental design.

      (4) Discussion: It is unclear what the authors mean by 'some sort of exhausted glycolytic capacity'.

      We have slightly modified the phrase.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript reports important in vitro biochemical and in planta experiments to study the receptor activation mechanism of plant membrane receptor kinase complexes with non-catalytic intracellular kinase domains. Several lines of evidence convincingly show that one such putative pseudokinase, the immune receptor EFR achieves an active conformation following phosphorylation by a co-receptor kinase, and then in turn activates the co-receptor kinase allosterically to enable it to phosphorylate down-stream signaling components. This manuscript will be of interest to scientists focusing on cell signalling and allosteric regulation.

      We wish to clarify that EFR is itself, not a pseudokinase. We could show in previous work (Bender et al., 2021; https://doi.org/10.1073/pnas.2108242118 ) that EFR has catalytic activity in vitro. This catalytic activity is, however, not required for elf18-induced immune signaling in planta.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors use an elegant but somewhat artificial heterodimerisation approach to activate the isolated cytoplasmic domains of different receptor kinases (RKs) including the receptor kinase BRI1 and EFR. The developmental RK BRI1 is known to be activated by the co-receptor BAK1. Active BRI1 is then able to phosphorylate downstream substrates. The immune receptor EFR is also an active protein kinase also activated by the co-receptor BAK1. EFR however appears to have little or no kinase activity but seems to use an allosteric mechanism to in turn enable BAK1 to phosphorylate the substrate kinase BIK1. EFR tyrosine phosphorylation by BAK1 appears to trigger a conformational change in EFR, activating the receptor. Likewise, kinase activating mutations can cause similar conformational transitions in EFR and also in BAK1 in vitro and in planta.

      We wish to clarify that we make no strong link between tyrosine phosphorylation and the conformational change leading to activation of the complex. Rather, the HDX-MS data demonstrate the structural importance of Tyr836 for the activation mechanism. At present, we do not know how phosphorylation of the residue would affect the activation process.

      Strengths:

      I particularly liked The HDX experiments coupled with mutational analysis (Fig. 2) and the design and testing of the kinase activating mutations (Fig. 3), as they provide novel mechanistic insights into the activation mechanisms of EFR and of BAK1. These findings are nicely extended by the large-scale identification of EFR-related RKs from different species with potentially similar activation mechanisms (Fig. 5).

      Weaknesses:

      In my opinion, there are currently two major issues with the present manuscript. (1) The authors have previously reported that the EFR kinase activity is dispensible for immune signaling (https://pubmed.ncbi.nlm.nih.gov/34531323/) but the wild-type EFR receptor still leads to a much better phosphorylation of the BIK1 substrate when compared to the kinase inactive D849N mutant protein (Fig. 1). (2) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question, but these HDX studies were not carried out due to technical limitations.

      Overall this is an interesting study that aims to advance our understanding of the activation mechanisms of different plant receptor kinases with important functions in plant immunity.

      Reviewer #2 (Public Review):

      Summary:

      Transmembrane signaling in plants is crucial for homeostasis. In this study, the authors set out to understand to what extent catalytic activity in the EFR tyrosine kinase is required in order to transmit a signal. This work was driven by mounting data that suggest many eukaryotic kinases do not rely on catalysis for signal transduction, relying instead on conformational switching to relay information. The crucial findings reported here involve the realisation that a kinase-inactive EFR can still activate (ie lead to downstream phosphorylation) of its partner protein BAK1. Using a convincing set of biochemical, mass spectrometric (HD-exchange) and in vivo assays, the team suggest a model in which EFR is likely phosphorylated in the canonical activation segment (where two Ser residues are present), which is sufficient to generate a conformation that can activate BAK1 through dimersation. A model is put forward involving C-helix positioning in BAK1, and the model extended to other 'non-RD' kinases in Arabidopsis kinases that likely do not require kinase activity for signaling.

      We prefer not to describe EFR as a tyrosine kinase. It may be the case that EFR can function under certain conditions as a dual-specificity protein kinase, but this has never been demonstrated experimentally. We therefore describe EFR as a Ser/Thr protein kinase, since it is known that the isolated cytoplasmic domain can phosphorylate on Ser and Thr residues (Wang et al., 2014; https://doi.org/10.1016/j.jprot.2014.06.009).

      Strengths:

      The work uses logical and well-controlled approaches throughout, and is clear and convincing in most areas, linking data from IPs, kinase assays (including clear 32P-based biochemistry), HD-MX data (from non-phosphorylated EFR) structural biology, oxidative burst data and infectivity assays. Repetitions and statistical analysis all appear appropriate.

      Overall, the work builds a convincing story and the discussion does a clear job of explaining the potential impact of these findings (and perhaps an explanation of why so many Arabidopsis kinases are 'pseudokinases', including XPS1 and XIIa6, where this is shown explicitly).

      Weaknesses:

      No major weaknesses are noted from reviewing the data and the paper follows a logical course built on solid foundations; the use of Tables to explain various experimental data pertinent to the reported studies is appreciated.

      (1) The use of a, b,c, d in Figures 2C and 3C etc is confusing to this referee, and is now addressed in the latest version

      (2) The debate about kinase v pseudokinases is well over a decade old. For non-experts, the kinase alignments/issues raised are in PMID: 23863165 and might prove useful if cited.

      We have cited the suggested reference in the second paragraph of the discussion.

      (3) Early on in the paper, the concept of kinases and pseudokinases related to R-spine (and extended R-spine) stability and regulation really needs to be more adequately introduced to explain what comes next; e.g. some of the key work in this area for RAF and Tyr kinases where mutual F-helix Phe amino acid changes are evaluated (conceptually similar to this study of the E-helix Tyr to Phe changes in EFR) should be cited (PMID: 17095602, 24567368 and 26925779).

      As an alternative, we have amended the text in several places to focus on conformational toggling between active/inactive states rather than R-spine stability. We think that this keeps the message of our manuscript focused. We hope that the reviewer finds this acceptable.

      (4) In my version, some of the experimental text is also currently in the wrong order (and no page numbers, so hard for me to state exactly where in the manuscript); However, I am certain that Figure 2C is mentioned in the text when the data are actually shown in Figure 3C for the EFR-SSAA protein.

      Indeed, some references to Figure 2 in the text were incorrect. We have corrected these. References in the text to Figure 3 and the data reported therein are correct.

      (5) Tyr 156 in PKA is not shown in Supplement 1, 2A as suggested in the text; for readers, it will be important to show the alignment of the Tyr residue in other kinases; this has been updated in the second version. Although it is clearly challenging to generate phosphorylated EFR (seemingly through Codon-expansion here?), it appears unlikely that a phosphorylated EFR protein, even semi-pure, couldn't have been assayed to test the idea that the phosphorylation drives/supports downstream signaling. What about a DD or EE mutation, as commonly used (perhaps over-used) in MEK-type studies?

      Our aim with codon expansion was to generate recombinant protein carrying high-stoichiometry phosphorylation at sites which we have previously documented to be required for downstream signaling (Macho et al., 2014; Bender et al., 2021). We additionally demonstrated previously that a DD mutant of the activation loop sites in EFR does not fully complement the efr-1 mutant (Bender et al., 2021), suggesting that the Asp mutations are not good phospho-mimics in this context. We therefore did not generate DD or EE mutations for in vitro studies.

      Impact:

      The work is an important new step in the huge amount of follow-up work needed to examine how kinases and pseudokinases 'talk' to each other in (especially) the plant kingdom, where significant genetic expansions have occurred. The broader impact is that we might understand better how to manipulate signaling for the benefit of plants and mankind; as the authors suggest, their study is a natural progression both of their own work, and the kingdom-wide study of the Kannan group.

      Reviewer #3 (Public Review):

      The study presents strong evidence for allosteric activation of plant receptor kinases, which enhances our understanding of the non-catalytic mechanisms employed by this large family of receptors.

      Plant receptor kinases (RKs) play a critical role in transducing extracellular signals. The activation of RKs involves homo- or heterodimerization of the RKs, and it is believed that mutual phosphorylation of their intracellular kinase domains initiates downstream signaling. However, this model faces a challenge in cases where the kinase domain exhibits pseudokinase characteristics. In their recent study, Mühlenbeck et al. reveal the non-catalytic activation mechanisms of the EFR-BAK1 complex in plant receptor kinase signaling. Specifically, they aimed to determine that the EFR kinase domain activates BAK1 not through its kinase activity, but rather by utilizing a "conformational toggle" mechanism to enter an active-like state, enabling allosteric trans-activation of BAK1. The study sought to elucidate the structural elements and mutations of EFR that affect this conformational switch, as well as explore the implications for immune signaling in plants. To investigate the activation mechanisms of the EFR-BAK1 complex, the research team employed a combination of mutational analysis, structural studies, and hydrogen-deuterium exchange mass spectrometry (HDX-MS) analysis. For instance, through HDX-MS analysis, Mühlenbeck et al. discovered that the EFR (Y836F) mutation impairs the accessibility of the active-like conformation. On the other hand, they identified the EFR (F761H) mutation as a potent intragenic suppressor capable of stabilizing the active-like conformation, highlighting the pivotal role of allosteric regulation in BAK1 kinase activation. The data obtained from this methodology strengthens their major conclusion. Moreover, the researchers propose that the allosteric activation mechanism may extend beyond the EFR-BAK1 complex, as it may also be partially conserved in the Arabidopsis LRR-RK XIIa kinases. This suggests a broader role for non-catalytic mechanisms in plant RK signaling.

      The allosteric activation mechanism was demonstrated for receptor tyrosine kinases (RTKs) many years ago. A similar mechanism has been suggested for the activation of plant RKs, but experimental evidence for this conclusion is lacking. Data in this study represent a significant advancement in our understanding of non-catalytic mechanisms in plant RK signaling. By shedding light on the allosteric regulation of BAK1, the study provides a new paradigm for future research in this area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have considered points 1-5 raised in my initial review and the revised manuscript contains a more balanced discussion and limitation section. No additional experiments have been performed to substantiate the envisioned allosteric activation mechanism of the co-receptor kinase BAK1 by the receptor EFR. I rewrote the public statement accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Thanks for responding to my comments.

      Reviewer #3 (Recommendations For The Authors):

      The revised manuscript has fully addressed my previous concerns and is now suitable for publication in eLife.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Key Considerations:

      There seem to be two inconsistencies related to some results depicted in Figures 1, 2, 3 and 5.

      Firstly, Figure 1 shows the effect on C_Las infection (_C_Las+) compared to the control (_C_Las-), where results show an increase of TAG, Glycogen, lipid droplet size, oviposition period, and fecundity. In Figures 2, 3, and 5, the authors establish the involvement of the genes _DcAKH, DcAKHR, and miR34 in this process, by showing that by preventing the function of these three factors the effects of _C_Las+ are lost. However, while Figure 1 shows the increase of TAG and lipid droplet size in _C_Las+, Figures 2, 3, and 5 do not show a significant elevation in TAG when comparing _C_Las- and _C_Las+.

      Secondly, in addition to the absence of statistical difference in TAG and lipid droplet size observed in Figure 1, Figures 2, 3, and 5 show an increase in TAG and lipid droplet size after ds_DcAKH_ (Figure 2), ds_DcAKHR_ (Figure 3) and agomiR34 (Figure 5) treatments. Considering that AKH, AKHR, and miR34 are important factors to _C_Las-induce increase in TAG and lipid droplet size, one might expect a reduction in TAG and lipid droplet size when _C_Las+ insects are silenced for these factors, contrary to the observed results.

      Thanks for your excellent suggestion. Lipid droplets are cellular organelles responsible for storing lipids within cells, playing a crucial role in fat metabolism and energy homeostasis. The formation and breakdown of lipid droplets involve a complex interplay of genes and enzymes, including DGAT (for synthesis), ATGL and HSL (for breakdown). In C_Las-negative _D. citri, there is a delicate balance between creasing and breaking down of lipid droplets. The enlargement of lipid droplet size following C_Las infection may result from a significantly higher synthesis rate compared to breakdown, as more energy is required during early ovarian development. The hormone AKH, a key player in fat metabolism, primarily stimulates fat breakdown. Therefore, when _DcAKH and DcAKHR are silenced without affecting fat synthesis, there is no enhancement of fat breakdown; instead, there is an accumulation of lipid droplets, resulting in their enlargement. This suggests that _C_Las infection affects both the breakdown and synthesis of lipid droplets, while AKH and AKHR primarily impact the breakdown, leading to similar outcomes. However, the underlying physiological mechanisms warrant further in-depth exploration.

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 25: change "In addition" to "Additionally".

      Thanks for your wonderful suggestion. We have changed “In addition” to “Additionally” in our revised manuscript (Line 26).

      (2) Lines 60-72: Have there been any previous reports on the interaction between host AKH hormones and microorganisms in insects or animals? If yes, please add more background.

      Thanks for your wonderful suggestion. We have added the interactions between host AKH hormones and microorganisms in insects (Line 74-81).

      (3) Lines 82-95: add the following reference about the miR-275 of Diaphorina citri in the background. Nian, X., Luo, Y., He, X., Wu, S., Li, J., Wang, D., Holford, P., Beattie, G. A. C., Cen, Y., Zhang, S., & He, Y. (2024). Infection with 'Candidatus Liberibacter asiaticus' improves the fecundity of Diaphorina citri aiding its proliferation: A win-win strategy. Molecular Ecology, 33, e17214.

      Thanks for your wonderful suggestion. We have added the sentence “in D. citri-C_Las interaction, _C_Las hijacks the JH signaling pathway and host miR-275 that targets the _vitellogenin receptor (DcVgR) to improve D. citri fecundity, while simultaneously increasing the replication of C_Las itself, suggesting a mutualistic interaction in _D. citri ovaries with _C_Las” in our revised manuscript (Line 97-100).

      (4) In the figures of Nile red staining, the digit of the scale bar should be added.

      Thanks for your wonderful suggestion. We have added the digit of the scale bar for Nile red staining in the Figure 1C, 2E, 3E, 5C.

      (5) In Figures 2G-H, 3G-H, 5E-F, the presentation of data should be consistent with Figure 1D-E.

      Thanks for your wonderful suggestion. We have changed figure 1D-E in our revised manuscript.

      (6) In the discussion part, more information should be added about miR-275 and DcVgR from the above reference.

      Thanks for your wonderful suggestion. We have added the information “In D. citri-C_Las interaction, _C_Las operates host hormone signaling and miRNA to mediate the mutualistic interaction between _D. citri fecundity and its replication” in Line 350-353.

      (7) For the primer specific, please add the melting curves for qPCR primers of DcAKH, DcAKHR, Dcβ-ACT, U6, and miR-34 in the supplementary material.

      Thanks for your wonderful suggestion. We have added the melting curves for qPCR primers of DcAKH, DcAKHR, Dcβ-ACT, U6 and miR-34 in the supplementary material of Figure S6.

      (8) Line 476: Dcβ-ACT was indicated as a gene and should be Italic.

      Thanks for your wonderful suggestion. We have changed “DcβACT” to “Dcβ-ACT” in our revised manuscript (Line 491).

      (9) Reference style should be consistent and correct. Like [5], [10], [37], [47].

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In order to better engage readers, I suggest emphasizing the "enhanced fecundity" in the title. A suggestion for the revised title is: Adipokinetic hormone signaling mediates the enhanced fecundity of Diaphorina citri infected by 'Candidatus Liberibacter asiaticus'.

      Thanks for your wonderful suggestion. We have changed the title to “Adipokinetic hormone signaling mediates the enhanced fecundity of Diaphorina citri infected by 'Candidatus Liberibacter asiaticus'” in our revised manuscript.

      (2) For the abstract, in lines 14-15, please change the first sentence to "Diaphorina citri serves as the primary vector for 'Candidatus Liberibacter asiaticus' (C_Las), the bacterium associated with the severe Asian form of huanglongbing." In line 18, delete "present". In line 19, change "increased" to "increasing". In line 21, change "triacylglycerol accumulation" to "the accumulation of triacylglycerol". In line 33, change "in _D. citri ovaries with C_Las" to "between _C_Las and _D. citri ovaries".

      Thanks for your wonderful suggestion. We have revised them following your suggestion in our revised manuscript, including changed “Diaphorina citri is the primary vector of the bacterium, ‘Candidatus Liberibacter asiaticus’ (C_Las) associated with the severe Asian form of huanglongbing” to “_Diaphorina citri serves as the primary vector for 'Candidatus Liberibacter asiaticus' (C_Las), the bacterium associated with the severe Asian form of huanglongbing” in Line 15-16; deleted "present" in Line 19; changed "increased" to "increasing" in Line 20; changed "triacylglycerol accumulation" to "the accumulation of triacylglycerol" in Line 22; changed "in _D. citri ovaries with C_Las" to "between _C_Las and _D. citri ovaries" in Line 34.

      (3) In lines 57-59, change "How D. citri maintains a balance between lipid metabolism and increased fecundity after infection with C_Las is not known." to "However, the mechanism of how _D. citri maintains a balance between lipid metabolism and increased fecundity after infection with _C_Las remains unknown.".

      Thanks for your wonderful suggestion. We have changed " How D. citri maintains a balance between lipid metabolism and increased fecundity after infection with C_Las is not known" to "However, the mechanism of how _D. citri maintains a balance between lipid metabolism and increased fecundity after infection with _C_Las remains unknown" in our revised manuscript (Line 58-60).

      (4) In Figure 1, "n.s" should be changed to "n.s.", "n.s." should be added in 13 DAE of Figure 1A, and the specific numerical value of the scale bar should be indicated on Figures 1C, 2E, 3E, and 5C.

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      (5) In all the figure legends, the "**P < 0.01,***P < 0.001" should be changed to "**p < 0.01,***p < 0.001".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      (6) In Figures 1D-E, the preoviposition period and oviposition period were presented using a box diagram, but in other figures (including Figure 2G-H, Figure 3G-H, Figure 5E-F) these were shown using a column chart. Please keep the method of presentation consistent.

      Thanks for your wonderful suggestion. We have revised the figure 1D-E in our revised manuscript.

      (7) For discussion, in line 333, change "Increasing numbers" to "An increasing number". In line 334, change "vertically transmitted" to "transmitted vertically".

      Thanks for  your wonderful suggestion. We have changed "Increasing numbers" to "An increasing number" in Line 345; changed "vertically transmitted" to "transmitted vertically" in Line 346 in our revised manuscript.

      (8) In lines 338-342, change "There are few studies on the mechanisms underlying vector-bacteria interactions. However, Singh and Linksvayer (2020) [38] found that Wolbachia-infected colonies of Monomorium pharaonis had increased colony-level growth, accelerated colony reproduction, and shortened colony life cycles compared to those that were uninfected." to "Although there is limited research on the mechanisms underlying vectorbacteria interactions, Singh and Linksvayer (2020) [38] found that Wolbachia_infected colonies of _Monomorium pharaonis exhibited increased colony-level growth, accelerated colony reproduction, and shortened colony life cycles compared to uninfected colonies.".

      Thanks for your wonderful suggestion. We have revised it in our revised manuscript (Line 350-355) .

      (9) In line 370, delete "present". In lines 386-387, change "More and more miRNAs have been reported to be involved in the metabolic processes of insects including reproduction." to "There is increasing evidence implicating miRNAs in the metabolic processes of insects, particularly in relation to reproduction.".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript, including deleted "present" in Line 383 and changed "More and more miRNAs have been reported to be involved in the metabolic processes of insects including reproduction" to "There is increasing evidence implicating miRNAs in the metabolic processes of insects, particularly in relation to reproduction" in Line 399-400.

      (10) In line 423, change "After infection with C_Las, _D. citri are more fecund than their uninfected counterparts." to "Upon infection with C_Las, _D. citri exhibits enhanced fecundity compared to uninfected individuals.". In lines 424-425 and 439-440, change "the more offspring of D. citri, the more C_Las in the field" to "the increased offspring of _D. citri contributes to a higher presence of _C_Las in the field.". In Line 429, change " information" to "insights".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript, including changed "After infection with C_Las, _D. citri are more fecund than their uninfected counterparts" to "Upon infection with C_Las, _D. citri exhibits enhanced fecundity compared to uninfected individuals" in Line 436-437; changed "the more offspring of D. citri, the more C_Las in the field" to "the increased offspring of _D. citri contributes to a higher presence of _C_Las in the field" in Line 438-439; changed "information" to "insights" in Line 443.

      (11) In lines 446-447, change "The _C_Las-infected lemon plants and psyllids were monitored to detect _C_Las infection monthly using the quantitative polymerase chain reaction (qPCR)" to "Monthly monitoring of the _C_Las infection in both the lemon plants and psyllids was conducted using quantitative polymerase chain reaction (qPCR)".

      Thanks for your wonderful suggestion. We have revised it in our revised manuscript (Line 460-461).

      (12) In lines 452-458, how did the authors identify homologous sequences of AKH and AKHR for phylogenetic tree analysis and alignment of the amino acid sequences? From NCBI or other databases? The methodological details should be added.

      Thanks for your wonderful suggestion. We have added the methodological details in our revised manuscript (Line 469-470).

      (13) In line 476, Dcβ-ACT should be italic.

      Thanks for your wonderful suggestion. We have changed “DcβACT” to italic in our revised manuscript (Line 491).

      (14) In line 538, the manufacturer should be provided for Nile Red.

      Thanks for your wonderful suggestion. We have provided the manufacturer of Nile Red in our revised manuscript (Line 553).

      (15) Does miR-34 have any other target genes? If yes, whether they have any function in the fecundity improvement of D. citri after infected by CLas.

      Thanks for your insightful suggestion. In addition to DcAKHR, we predicted three other genes have binding sites in 3’UTR with miR-34, including Innexin, T-box transcription factor TBX1, and fatty acid synthase. Despite this, the mRNA expression levels of all three genes remained unchanged between _C_Las-negative and _C_Las-postive females. Therefore, we believe that these genes are not implicated in the fecundity improvement.

      (16) The reference format should be unified. Please revise references 10, 28, 43, 47, and 53.

      Thanks for your wonderful suggestion. We have revised them in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their feedback on our manuscript. Taking the advice of the reviewers, we have streamlined the text and formatted the figures to conform to the format instructions. We believe that the revised manuscript has been improved. 

      Point-by-point responses are presented below.

      Reviewer #1:

      (1) There is an over-interpretation regarding the results in Figure 6A. There is no difference between isoHD1 iMac control and HD1 Mut iMac.<br />

      We thank the reviewer for his/her feedback on our manuscript. We have since changed the wordings on Page 11, line 294 of the manuscript, to reflect this important point.

      Reviewer #2:

      (2) The authors have not elucidated the significance of the increased CSF1 dosage in Figure 2F, aside from its effect on cell viability, lacking a thorough discussion of this result.

      We have incorporated the significance of the results of our CSF1 dosage data with a newly added observation of an upregulated immature myeloid marker and downregulated expression mature macrophage marker within mutant iMac from the respective RNA-seq data (Page 5, line 163); and elaborated further within the Discussion section that this results in the possible generation of immature iMacs even after maturation (Page 14, line 356).

      (3) Additionally, while transcriptomic and metabolic alterations related to the mutation were demonstrated in iMac models, similar investigations in iMicros are absent, necessitating further experiments to validate the findings across cell models.

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (4) The conclusion drawn regarding cytokine levels lacks robust support from the data, particularly considering the varied responses observed in different mutant lines. Further analysis of the secretome (e.g. via ELISA) could provide additional insights.

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (5) Moreover, the characterization of iMicros is incomplete, with limited protein-level analysis (e.g. validate RNA-seq via flow cytometry).

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (6) Additionally, the claim of microglial-like morphology lacks adequate evidence, as the provided image is insufficient for such an assessment.

      We have added confocal images depicting microglial-like morphology in our co-culture system within Supp Fig 3C.

      (7) RNA-seq experiments should be represented better, it is not possible to read the legends or gene names in the figures. Maybe the data sets can be combined into PCAone and one overall analysis, e.g. via WGCNA-like analyses? This would make it easier for the reader to compare the two cell lines side by side.

      We have since enhanced the quality of the respective RNA-seq figures with enlarged data points and gene names for better clarity.

      (8) Statistical test information is missing.

      We are sorry for leaving this out and have added the statistical test information within Page 15 of the methods section.

      (9) Finally, inconsistent terminology usage throughout the paper may confuse readers (iMac versus iMicros).

      We have streamlined the terminology used within Page 10, line 265 and 267, of the manuscript for better consistency.

      (10) Fig. 1D: which cell line is displayed here?

      Mut HD1 iPSC is displayed here. We have also revised the figure legend of Fig 1D within Page 1, line 8 to include this information.

      (11) Fig. 1E: Karyotype of which cell line is shown?

      We have included karyotype of both IsoHD1 and IsoHD2 iPSC in Fig 1E, and also revised the legend within Page 1, line 11, to reflect this change.

      (12) Supp. Fig. 1: scale bar information missing.

      We thank the reviewer for pointing out this and have revised the legend within Page 1, line 17, to include scale bar information.

      (13) Fig. 5: legend for A is missing.

      We thank the reviewer for pointing out this and have revised the legend within Page 2, line 91, to include Figure (A) within.

      14) Supp. Fig. 3A says 30 days, but only 23 days are shown.

      We are sorry for making this inadvertent typo and have since aligned the correct days (31 days) shown within the figure (Supp Fig 3A) and legend (Page 3, line 110, 113), as mentioned in the manuscript.

      (15) Supp. Fig. 3C: scale bar length is incorrect.

      We did a recheck and are confident that the scale car is of the correct length. The images displaying the respective fluorescent channels are proportionately reduced with respect to the main figure (now Supp. Fig. 3D), and thus are of the same size (200 uM).

      (16) Fig. 6: legend for D, E is missing.

      We have revised the figure legend within Page 3, line 128, 130 and 131, to address said missing legends.

      (17) Stem cells do also express Sox2. how does Sox2 expression lead to the conclusion of an optimal generated organoid?

      We thank the reviewer for pointing this out. Sox2 has been defined as a core intrinsic factor for regulating pluripotency (Avilian et al, 2003, Zhang et al, 2014), as well as lineage specifiers to regulate ectodermal differentiation which is crucial in controlling neural initiation and differentiation from iPSC (Zhao et al, 2004, Thomson et al, 2011, Wang et al 2014). Additionally, Sox2 is highly expressed in proliferating neural progenitor cells as documented in previous iterations of cerebral organoids generation protocol (Lancaster et al 2013, Qian at el, 2018). Perhaps “optimally” sounds too forced in this context, as such we have toned down on the phrasing.  

      (18) HD1 and HD2 react differently (e.g. in IL-1B production), but the text is written often as if both cell lines react in the same way.

      We thank the reviewer for pointing this out and have since clarified this within Page 4, line 366-368, of the manuscript.

      (19) Precise information on medium missing (e.g. no Pen/Strep?).

      We thank the reviewer for pointing out this. Culturing of iPSC colonies was done without the use of Pen/Strep. Additionally, we have elaborated the medium composition for our iMac cultures for clarity within Page 4, line 106, of Materials and Methods as well as the information within Supp. Table 4.

      (20) How was ReleSR used exactly?

      We have included the usage of ReleSR within Page 2, line 41 of Materials and Methods.

      (21) What kind of microscopes/objectives were used for imaging?

      We have added the respective microscope details for bright-field, phase-contrast and cytospin related experiments within Page 3, line 73, and Page 14, line 360, of Materials and Methods.

      (22) For the dissociation of organoids: what kind of pipit was use and at which temperature were organoids incubated?

      We have included the pipette used for organoids dissociation, as well as the incubation temperature for organoids culture within Page 9, line 243, 244 and 245, of Materials and Methods.

      (23) How was the RNA-seq analysis done? Which packages? Which versions?

      We provide now the information requested in the material and method section.

    1. Author response:

      Response letter

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) It is a nice study but lacks some functional data required to determine how useful these alleles will be in practice, especially in comparison with the figure line that stimulated their creation.

      We are grateful for this comment. For the usefulness of these alleles, figure 3 shows that specific and efficient genetic manipulation of one cell subpopulation can be achieved by mating across the DreER mouse strain to the rox-Cre mouse strain. In addition, figure 6 shows that R26-loxCre-tdT can effectively ensure Cre-loxP recombination on some gene alleles and for genetic manipulation. The expression of the tdT protein is aligned with the expression of the Cre protein (Alb roxCre-tdT and R26-loxCre-tdT, figure 2 and figure 5), which ensures the accuracy of the tracing experiments. We believe more functional data can be shown in future articles that use mice lines mentioned in this manuscript.

      (2) The data in Figure 5 show strong activity at the Confetti locus, but the design of the newly reported R26-loxCre line lacks a WPRE sequence that was included in the iSure-Cre line to drive very robust protein expression.

      Thank you for coming up with this point in the manuscript. In the R26-loxCre-tdT mice knock-in strategy, the WPRE sequence is added behind the loxCre-P2A-tdT sequence.

      (3) the most valuable experiment for such a new tool would be a head-to-head comparison with iSure (or the latest iSure version from the Benedito lab) using the same CreER and target foxed allele. At the very least a comparison of Cre protein expression between the two lines using identical CreER activators is needed.

      According to the reviewer’s suggestion, we will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (4) Why did the authors not use the same driver to compare mCre 1, 4, 7, and 10? The study in Figure 2 uses Alb-roxCre for 1 and 7 and Cdh5-roxCre for 4 and 10, with clearly different levels of activity driven by the two alleles in vivo. Thus whether mCre1 is really better than mCre4 or 10 is not clear.

      Thank you for raising this concern. After screening out four robust versions of mCre, we generated these four roxCre knock-in mice. It is unpredictable for us which is the most robust mCre in vivo. It might be one or two mCre versions that work efficiently. For example, if Alb-mCre1 was competitive with Cdh5-mCre10, we can use them for targeting genes in different cell types, broadening the potential utility of these mice.

      (5) Technical details are lacking. The authors provide little specific information regarding the precise way that the new alleles were generated, i.e. exactly what nucleotide sites were used and what the sequence of the introduced transgenes is. Such valuable information must be gleaned from schematic diagrams that are insufficient to fully explain the approach.

      Thank you for your careful suggestions.

      We will provide schematic figures as well as nucleotide sequences for mice generation in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) The scenario where the lines would demonstrate their full potential compared to existing models has not been tested.

      We are grateful for this suggestion. We will compare iSuRe-Cre with R26-loxCre-tdT by using Alb-CreER and target R26-Confetti in the revised manuscript.

      (2) The challenge lies in performing such experiments, as low doses of tamoxifen needed for inducing mosaic gene deletion may not be sufficient to efficiently recombine multiple alleles in individual cells while at the same time accurately reporting gene deletion. Therefore, a demonstration of the efficient deletion of multiple floxed alleles in a mosaic fashion would be a valuable addition.

      Thank you for your constructive comments. Mosaic analysis using sparse labeling and efficient gene deletion would be our future direction using roxCre and loxCre strategies. We will include some discussion of using such strategy in the revised manuscript.

      (3) When combined with the confetti line, the reporter cassette will continue flipping, potentially leading to misleading lineage tracing results.

      Thank you for your professional comments. Indeed, the confetti used in this study can continue flipping, which would lead to potentially misleading lineage tracing results. Our use of R26-Confetti is to demonstrate the robustness of mCre for recombination. Some multiple-color mice lines that don’t flip have been published, for example, R26-Confetti2(10.1038/s41588-019-0346-6) and Rainbow (10.1161/CIRCULATIONAHA.120.045750). These reporters could be used for tracing Cre-expressing cells, without concerns of flipping of reporter cassettes.

      (4) Constitutive expression of Cre is also associated with toxicity, as discussed by the authors in the introduction.

      Thank you for your professional comments. The toxicity of constitutive expression of Cre and the toxicity associated with tamoxifen treatment in CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      Reviewer #3 (Public Review):

      (1) Although leakiness is rather minor according to the original publication and the senior author of the study wrote in a review a few years ago that there is no leakiness(https://doi.org/10.1016/j.jbc.2021.100509).

      Thank you so much for your careful check. In this review (https://doi.org/10.1016/j.jbc. 2021.100509), the writer’s comments on iSuRe-Cre are on the reader's side, and all summary words are based on the original published paper (10.1038/s41467-019-10239-4). Currently, we have tested iSuRe-Cre in our hands. We did detect some leakiness in the heart and muscle, but hardly in other tissues as shown in the following figure.

      Author response image 1.

      Leakiness in Alb CreER;iSuRe-Cre mouse line. Pictures are representative results for 5 mice. Scale bars, white 100 µm.

      (2) I would have preferred to see a study, which uses the wonderful new tools to address a major biological question, rather than a primarily technical report, which describes the ongoing efforts to further improve Cre and Dre recombinase-mediated recombination.

      We gratefully appreciate your valuable comment. The roxCre and loxCre mice mentioned in this study provide more effective methods for inducible genetic manipulation in studying gene function. We hope that the application of our new genetic tools could help address some major biological questions in different biomedical fields in the future.

      (3) Very high levels of Cre expression may cause toxic effects as previously reported for the hearts of Myh6-Cre mice. Thus, it seems sensible to test for unspecific toxic effects, which may be done by bulk RNA-seq analysis, cell viability, and cell proliferation assays. It should also be analyzed whether the combination of R26-roxCre-tdT with the Tnni3-Dre allele causes cardiac dysfunction, although such dysfunctions should be apparent from potential changes in gene expression.

      We are sorry that we mistakenly spelled R26-loxCre-tdT into R26-roxCre-tdT in our manuscript. We have not generated R26-roxCre-tdT mouse line. We also thank the reviewer for concerns about the toxicity of high Cre expression. The toxicity of constitutive expression of Cre and the toxicity of tamoxifen treatment of CreER mice line (10.1038/s44161-022-00125-6) are known to the field. This study can’t solve the toxicity of the constitutive expression of Cre in this work. Many mouse lines with constitutive Cre driven by different promoters are present across various fields, representing similar toxicity. To solve this issue, it would be possible to construct a new strategy that enables the removal of Cre after its expression.

      (4) Is there any leakiness when the inducible DreER allele is introduced but no tamoxifen treatment is applied? This should be documented. The same also applies to loxCre mice.

      In this study, we come up with new mice tool lines, including Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT. As the data shown in supplementary figure 1, supplementary figure 2, and figure 4D, Alb roxCre1-tdT, Cdh5 roxCre4-tdT, Alb roxCre7-GFP, Cdh5 roxCre10-GFP and R26-loxCre-tdT are not leaky. Therefore, if there is any leakiness driven by the inducible DreER or CreER allele, the leakiness is derived from the DreER or CreER. We will supplement relevant experimental data in the revision.

      (5) It would be very helpful to include a dose-response curve for determining the minimum dosage required in Alb-CreER; R26-loxCre-tdT; Ctnnb1flox/flox mice for efficient recombination.

      Thank you for your suggestion. We understand the reviewer’s concern. We can do a dose-response curve in the revision work.

      (6) In the liver panel of Figure 4F, tdT signals do not seem to colocalize with the VE-cad signals, which is odd. Is there any compelling explanation?

      As the file-loading website has a file size limitation, the compressed image results in some signal unclear. The following are the zoom-out figures. The staining in Figure 4F will be optimized and high-resolution images will be provided in the revision.

      Author response image 2.

      (7) The authors claim that "virtually all tdT+ endothelial cells simultaneously expressed YFP/mCFP" (right panel of Figure 5D). Well, it seems that the abundance of tdT is much lower compared to YFP/mCFP. If the recombination of R26-Confetti was mainly triggered by R26-loxCre-tdT, the expression of tdT and YFP/mCFP should be comparable. This should be clarified.

      Thank you so much for your careful check. We checked these signals carefully and didn't find the “much lower” tdT signal. As the file-loading website has a file size limitation, the compressed image results in some signal unclear. We attached clear high resolution images here. The following figure shows how we split the tdT signal and compared it with YFP/mCFP.

      Author response image 3.

      (8) In several cases, the authors seem to have mixed up "R26-roxCre-tdT" with "R26-loxCre-tdT". There are errors in #251 and #256.Furthermore, in the passage from line #278 to #301. In the lines #297 and #300 it should probably read "Alb-CreER; R26-loxCretdT;Ctnnb1flox/flox"" rather than "Alb-CreER;R26-tdT2;Ctnnb1flox/flox".

      We are grateful for these careful observations. We have corrected these typos accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The work by Combrisson and colleagues investigates the degree to which reward and punishment learning signals overlap in the human brain using intracranial EEG recordings. The authors used information theory approaches to show that local field potential signals in the anterior insula and the three sub regions of the prefrontal cortex encode both reward and punishment prediction errors, albeit to different degrees. Specifically, the authors found that all four regions have electrodes that can selectively encode either the reward or the punishment prediction errors. Additionally, the authors analyzed the neural dynamics across pairs of brain regions and found that the anterior insula to dorsolateral prefrontal cortex neural interactions were specific for punishment prediction errors whereas the ventromedial prefrontal cortex to lateral orbitofrontal cortex interactions were specific to reward prediction errors. This work contributes to the ongoing efforts in both systems neuroscience and learning theory by demonstrating how two differing behavioral signals can be differentiated to a greater extent by analyzing neural interactions between regions as opposed to studying neural signals within one region.

      Strengths:

      The experimental paradigm incorporates both a reward and punishment component that enables investigating both types of learning in the same group of subjects allowing direct comparisons.

      The use of intracranial EEG signals provides much needed insight into the timing of when reward and punishment prediction errors signals emerge in the studied brain regions.

      Information theory methods provide important insight into the interregional dynamics associated with reward and punishment learning and allows the authors to assess that reward versus punishment learning can be better dissociated based on interregional dynamics over local activity alone.

      We thank the reviewer for this accurate summary. Please find below our answers to the weaknesses raised by the reviewer.

      Weaknesses:

      The analysis presented in the manuscript focuses solely on gamma band activity. The presence and potential relevance of other frequency bands is not discussed. It is possible that slow oscillations, which are thought to be important for coordinating neural activity across brain regions could provide additional insight.

      We thank the reviewer for pointing us to this missing discussion in the first version of the manuscript. We now made this point clearer in the Methods sections entitled “iEEG data analysis” and “Estimate of single-trial gamma-band activity”:

      “Here, we focused solely on broadband gamma for three main reasons. First, it has been shown that the gamma band activity correlates with both spiking activity and the BOLD fMRI signals (Lachaux et al., 2007; Mukamel et al., 2004; Niessing et al., 2005; Nir et al., 2007), and it is commonly used in MEG and iEEG studies to map task-related brain regions (Brovelli et al., 2005; Crone et al., 2006; Vidal et al., 2006; Ball et al., 2008; Jerbi et al., 2009; Darvas et al., 2010; Lachaux et al., 2012; Cheyne and Ferrari, 2013; Ko et al., 2013). Therefore, focusing on the gamma band facilitates linking our results with the fMRI and spiking literatures on probabilistic learning. Second, single-trial and time-resolved high-gamma activity can be exploited for the analysis of cortico-cortical interactions in humans using MEG and iEEG techniques (Brovelli et al., 2015; 2017; Combrisson et al., 2022). Finally, while previous analyses of the current dataset (Gueguen et al., 2021) reported an encoding of PE signals at different frequency bands, the power in lower frequency bands were shown to carry redundant information compared to the gamma band power.”

      The data is averaged across all electrodes which could introduce biases if some subjects had many more electrodes than others. Controlling for this variation in electrode number across subjects would ensure that the results are not driven by a small subset of subjects with more electrodes.

      We thank the reviewer for raising this important issue. We would like to point out that the gamma activity was not averaged across bipolar recordings within an area, nor measures of connectivity. Instead, we used a statistical approach proposed in a previous paper that combines non-parametric permutations with measures of information (Combrisson et al., 2022). As we explain in the “Statistical analysis” section, mutual information (MI) is estimated between PE signals and single-trial modulations in gamma activity separately for each contact (or for each pair of contacts). Then, a one-sample t-test is computed across all of the recordings of all subjects to form the effect size at the group-level. We will address the point of the electrode number in our answer below.

      The potential variation in reward versus punishment learning across subjects is not included in the manuscript. While the time course of reward versus punishment prediction errors is symmetrical at the group level, it is possible that some subjects show faster learning for one versus the other type which can bias the group average. Subject level behavioral data along with subject level electrode numbers would provide more convincing evidence that the observed effects are not arising from these potential confounds.

      We thank the reviewer for the two points raised. We performed additional analyses at the single-participant level to address the issues raised by the reviewer. We should note, however, that these results are descriptive and cannot be generalized to account for population-level effects. As suggested by the reviewer, we prepared two new figures. The first supplementary figure summarizes the number of participants that had iEEG contacts per brain region and pair of brain regions (Fig. S1A in the Appendix). It can be seen that the number of participants sampled in different brain regions is relatively constant (left panel) and the number of participants with pairs of contacts across brain regions is relatively homogeneous, ranging from 7 to 11 (right panel). Fig. S1B shows the number of bipolar derivations per subject and per brain region.

      Author response image 1.

      Single subject anatomical repartition. (A) Number of unique subject per brain region and per pair of brain regions (B) Number of bipolar derivations per subject and per brain region

      The second supplementary figure describes the estimated prediction error for rewarding and punishing trials for each subject (Fig. S2). The single-subject error bars represent the 95th percentile confidence interval estimated using a bootstrap approach across the different pairs of stimuli presented during the three to six sessions. As the reviewer anticipated, there are indeed variations across subjects, but we observe that RPE and PPE are relatively symmetrical, even at the subject level, and tend toward zero around trial number 10. These results therefore corroborate the patterns observed at the group-level.

      Author response image 2.

      Single-subject estimation of predictions errors. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval.

      Finally, to assess the variability of local encoding of prediction errors across participants, we quantified the proportion of subjects having at least one significant bipolar derivation encoding either the RPE or PPE (Fig. S4). As expected, we found various proportions of unique subjects with significant R/PPE encoding per region. The lowest proportion was achieved in the ventromedial prefrontal cortex (vmPFC) and lateral orbitofrontal cortex (lOFC) for encoding PPE and RPE, respectively, with approximately 30% of the subjects having the effect. Conversely, we found highly reproducible encodings in the anterior insula (aINS) and dorsolateral prefrontal cortex (dlPFC) with a maximum of 100% of the 9 subjects having at least one bipolar derivation encoding PPE in the dlPFC.

      Author response image 3.

      Taken together, we acknowledge a certain variability per region and per condition. Nevertheless, the results presented in the supplementary figures suggest that the main results do not arise from a minority of subjects.

      We would like to point out that in order to assess across-subject variability, a much larger number of participants would have been needed, given the low signal-to-noise ratios observed at the single-participant level. We thus prefer to add these results as supplementary material in the Appendix, rather than in the main text.

      It is unclear if the findings in Figures 3 and 4 truly reflect the differential interregional dynamics in reward versus punishment learning or if these results arise as a statistical byproduct of the reward vs punishment bias observed within each region. For instance, the authors show that information transfer from anterior insula to dorsolateral prefrontal cortex is specific to punishment prediction error. However, both anterior insula and dorsolateral prefrontal cortex have higher prevalence of punishment prediction error selective electrodes to begin with. Therefore the findings in Fig 3 may simply be reflecting the prevalence of punishment specificity in these two regions above and beyond a punishment specific neural interaction between the two regions. Either mathematical or analytical evidence that assesses if the interaction effect is simply reflecting the local dynamics would be important to make this result convincing.

      This is an important point that we partly addressed in the manuscript. More precisely, we investigated whether the synergistic effects observed between the dlPFC and vmPFC encoding global PEs (Fig. 5) could be explained by their respective local specificity. Indeed, since we reported larger proportions of recordings encoding the PPE in the dlPFC and the RPE in the vmPFC (Fig. 2B), we checked whether the synergy between dlPFC and vmPFC could be mainly due to complementary roles where the dlPFC brings information about the PPE only and the vmPFC brings information to the RPE only. To address this point, we selected PPE-specific bipolar derivations from the dlPFC and RPE-specific from the vmPFC and, as the reviewer predicted, we found synergistic II between the two regions probably mainly because of their respective specificity. In addition, we included the II estimated between non-selective bipolar derivations (i.e. recordings with significant encoding for both RPE and PPE) and we observed synergistic interactions (Fig. 5C and Fig. S9). Taken together, the local specificity certainly plays a role, but this is not the only factor in defining the type of interactions.

      Concerning the interaction information results (II, Fig. 3), several lines of evidence suggest that local specificity cannot account alone for the II effects. For example, the local specificity for PPE is observed across all four areas (Fig. 2A) and the percentage of bipolar derivations displaying an effect is large (equal or above 10%) for three brain regions (aINS, dlPLF and lOFC). If the local specificity were the main driving cause, we would have observed significant redundancy between all pairs of brain regions. On the other hand, the interaction between the aINS and lOFC displayed no significant redundant effect (Fig. 3B). Another example is the result observed in lOFC: approximately 30% of bipolar derivations display a selectivity for PPE (Fig. 2B, third panel from the left), but do not show clear signs of redundant encoding at the level of within-area interactions (Fig. 3A, bottom-left panel). Similarly, the local encoding for RPE is observed across all four brain regions (Fig. 2A) and the percentage of bipolar derivations displaying an effect is large (equal or above 10%) for three brain regions (aINS, dlPLF and vmPFC). Nevertheless, significant between-regions interactions have been observed only between the lOFC and vmPFC (Fig. 3B bottom right panel).

      To further support the reasoning, we performed a simulation to show that it is possible to observe synergistic interactions between two regions with the same specificity. As an example, we may consider one region locally encoding early trials of RPE and a second region encoding the late trials of the RPE. Combining the two with the II would lead to synergistic interactions, because each one of them carries information that is not carried by the other. To illustrate this point, we simulated the data of two regions (x and y). To simulate redundant interactions (first row), each region receives a copy of the prediction (one-to-all) and for the synergy (second row), x and y receive early and late PE trials, respectively (all-to-one). This toy example illustrates that the local specificity is not the only factor determining the type of their interactions. We added the following result to the Appendix.

      Author response image 4.

      Local specificity does not fully determine the type of interactions. Within-area local encoding of PE using the mutual information (MI, in bits) for regions X and Y and between-area interaction information (II, in bits) leading to (A) redundant interactions and (B) synergistic interactions about the PE

      Regarding the information transfer results (Fig. 4), similar arguments hold and suggest that the prevalence is not the main factor explaining the arising transfer entropy between the anterior insula (aINS) and dorsolateral prefrontal cortex (dlPFC). Indeed, the lOFC has a strong local specificity for PPE, but the transfer entropy between the lOFC and aINS (or dlPFC) is shown in Fig. S7 does not show significant differences in encoding between PPE and RPE.

      Indeed, such transfer can only be found when there is a delay between the gamma activity of the two regions. In this example, the transfer entropy quantifies the amount of information shared between the past activity of the aINS and the present activity of the dlPFC conditioned on the past activity of the dlPFC. The conditioning ensures that the present activity of the dlPFC is not only explained by its own past. Consequently, if both regions exhibit various prevalences toward reward and punishment but without delay (i.e. at the same timing), the transfer entropy would be null because of the conditioning. As a fact, between 10 to -20% of bipolar recordings show a selectivity to the reward PE (represented by a proportion of 40-60% of subjects, Fig.S4). However, the transfer entropy estimated from the aINS to the dlPFC across rewarding trials is flat and clearly non-significant. If the transfer entropy was a byproduct of the local specificity then we should observe an increase, which is not the case here.

      Reviewer #2:

      Summary:

      Reward and punishment learning have long been seen as emerging from separate networks of frontal and subcortical areas, often studied separately. Nevertheless, both systems are complimentary and distributed representations of rewards and punishments have been repeatedly observed within multiple areas. This raised the unsolved question of the possible mechanisms by which both systems might interact, which this manuscript went after. The authors skillfully leveraged intracranial recordings in epileptic patients performing a probabilistic learning task combined with model-based information theoretical analyses of gamma activities to reveal that information about reward and punishment was not only distributed across multiple prefrontal and insular regions, but that each system showed specific redundant interactions. The reward subsystem was characterized by redundant interactions between orbitofrontal and ventromedial prefrontal cortex, while the punishment subsystem relied on insular and dorsolateral redundant interactions. Finally, the authors revealed a way by which the two systems might interact, through synergistic interaction between ventromedial and dorsolateral prefrontal cortex.

      Strengths:

      Here, the authors performed an excellent reanalysis of a unique dataset using innovative approaches, pushing our understanding on the interaction at play between prefrontal and insular cortex regions during learning. Importantly, the description of the methods and results is truly made accessible, making it an excellent resource to the community.

      This manuscript goes beyond what is classically performed using intracranial EEG dataset, by not only reporting where a given information, like reward and punishment prediction errors, is represented but also by characterizing the functional interactions that might underlie such representations. The authors highlight the distributed nature of frontal cortex representations and propose new ways by which the information specifically flows between nodes. This work is well placed to unify our understanding of the complementarity and specificity of the reward and punishment learning systems.

      We thank the reviewer for the positive feedback. Please find below our answers to the weaknesses raised by the reviewer.

      Weaknesses:

      The conclusions of this paper are mostly supported by the data, but whether the findings are entirely generalizable would require further information/analyses.

      First, the authors found that prediction errors very quickly converge toward 0 (less than 10 trials) while subjects performed the task for sets of 96 trials. Considering all trials, and therefore having a non-uniform distribution of prediction errors, could potentially bias the various estimates the authors are extracting. Separating trials between learning (at the start of a set) and exploiting periods could prove that the observed functional interactions are specific to the learning stages, which would strengthen the results.

      We thank the reviewer for this question. We would like to note that the probabilistic nature of the learning task does not allow a strict distinction between the exploration and exploitation phases. Indeed, the probability of obtaining the less rewarding outcome was 25% (i.e., for 0€ gain in the reward learning condition and -1€ loss in the punishment learning condition). Thus, participants tended to explore even during the last set of trials in each session. This is evident from the average learning curves shown in Fig. 1B of (Gueguen et al., 2021). Learning curves show rates of correct choice (75% chance of 1€ gain) in the reward condition (blue curves) and incorrect choice (75% chance of 1€ loss) in the punishment condition (red curves).

      For what concerns the evolution of PEs, as reviewer #1 suggested, we added a new figure representing the single-subject estimates of the R/PPE (Fig S2). Here, the confidence interval is obtained across all pairs of stimuli presented during the different sessions. We retrieved the general trend of the R/PPE converging toward zero around 10 trials. Both average reward and punishment prediction errors converge toward zero in approximately 10 trials, single-participant curves display large variability, also at the end of each session. As a reminder, the 96 trials represent the total number of trials for one session for the four pairs and the number of trials for each stimulus was only 24.

      Author response image 5.

      Single-subject estimation of predictions errors. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red), ± 95% confidence interval

      However, the convergence of the R/PPE is due to the average across the pairs of stimuli. In the figure below, we superimposed the estimated R/PPE, per pair of stimuli, for each subject. It becomes very clear that high values of PE can be reached, even for late trials. Therefore, we believe that the split into early/late trials because of the convergence of PE is far from being trivial.

      Author response image 6.

      Single-subject estimation of predictions errors per pair of stimuli. Single-subject trial-wise reward PE (RPE - blue) and punishment PE (PPE - red)

      Consequently, nonzero PRE and PPE occur during the whole session and separating trials between learning (at the start of a set) and exploiting periods, as suggested by the reviewer, does not allow a strict dissociation between learning vs no-learning. Nevertheless, we tested the analysis proposed by the reviewer, at the local level. We splitted the 24 trials of each pair of stimuli into early, middle and late trials (8 trials each). We then reproduced Fig. 2 by computing the mutual information between the gamma activity and the R/PPE for subsets of trials: early (first row) and late trials (second row). We retrieved significant encoding of both R/PPE in the aINS, dlPFC and lOFC in both early and late trials. The vmPFC also showed significant encoding of both during early trials. The only difference emerges in the late trials of the vmPFC where we found a strong encoding of the RPE only. It should also be noted that here since we are sub-selecting the trials, the statistical analyses are only performed using a third of the trials.

      Taken together, the combination of high values of PE achieved even for late trials and the fact that most of the findings are reproduced even with a third of the trials does not justify the split into early and late trials here. Crucially, this latest analysis confirms that the neural correlates of learning that we observed reflect PE signals rather than early versus late trials in the session.

      Author response image 7.

      MI between gamma activity and R/PPE using early and late trials. Time courses of MI estimated between the gamma power and both RPE (blue) and PPE (red) using either early or late trials (first and second row, respectively). Horizontal thick lines represent significant clusters of information (p<0.05, cluster-based correction, non-parametric randomization across epochs).

      Importantly, it is unclear whether the results described are a common feature observed across subjects or the results of a minority of them. The authors should report and assess the reliability of each result across subjects. For example, the authors found RPE-specific interactions between vmPFC and lOFC, even though less than 10% of sites represent RPE or both RPE/PPE in lOFC. It is questionable whether such a low proportion of sites might come from different subjects, and therefore whether the interactions observed are truly observed in multiple subjects. The nature of the dataset obviously precludes from requiring all subjects to show all effects (given the known limits inherent to intracerebral recording in patients), but it should be proven that the effects were reproducibly seen across multiple subjects.

      We thank the reviewer for this remark that has also been raised by the first reviewer. This issue was raised by the first reviewer. Indeed, we added a supplementary figure describing the number of unique subjects per brain region and per pair of brain regions (Fig. S1A) such as the number of bipolar derivations per region and per subject (Fig. S1B).

      Author response image 8.

      Single subject anatomical repartition. (A) Number of unique subject per brain region and per pair of brain regions (B) Number of bipolar derivations per subject and per brain region

      Regarding the reproducibility of the results across subjects for the local analysis (Fig. 2), we also added the instantaneous proportion of subjects having at least one bipolar derivation showing a significant encoding of the RPE and PPE (Fig. S4). We found a minimum proportion of approximately 30% of unique subjects having the effect in the lOFC and vmPFC, respectively with the RPE and PPE. On the other hand, both the aINS and dlPFC showed between 50 to 100% of the subjects having the effect. Therefore, local encoding of RPE and PPE was never represented by a single subject.

      Author response image 9.

      Similarly, we performed statistical analysis on interaction information at the single-subject level and counted the proportion of unique subjects having at least one pair of recordings with significant redundant and synergistic interactions about the RPE and PPE (Fig. S5). Consistently with the results shown in Fig. 3, the proportions of significant redundant and synergistic interactions are negative and positive, respectively. For the within-regions interactions, approximately 60% of the subjects with redundant interactions are about R/PPE in the aINS and about the PPE in the dlPFC and 40% about the RPE in the vmPFC. For the across-regions interactions, 60% of the subjects have redundant interactions between the aINS-dlPFC and dlPFC-lOFC about the PPE, and 30% have redundant interactions between lOFC-vmPFC about the RPE. Globally, we reproduced the main results shown in Fig. 3.

      Author response image 10.

      Inter-subjects reproducibility of redundant interactions about PE signals. Time-courses of proportion of subjects having at least one pair of bipolar derivation with a significant interaction information (p<0.05, cluster-based correction, non-parametric randomization across epochs) about the RPE (blue) or PPE (red). Data are aligned to the outcome presentation (vertical line at 0 seconds). Proportion of subjects with redundant (solid) and synergistic (dashed) interactions are respectively going downward and upward.

      Finally, the timings of the observed interactions between areas preclude one of the authors' main conclusions. Specifically, the authors repeatedly concluded that the encoding of RPE/PPE signals are "emerging" from redundancy-dominated prefrontal-insular interactions. However, the between-region information and transfer entropy between vmPFC and lOFC for example is observed almost 500ms after the encoding of RPE/PPE in these regions, questioning how it could possibly lead to the encoding of RPE/PPE. It is also noteworthy that the two information measures, interaction information and transfer entropy, between these areas happened at non overlapping time windows, questioning the underlying mechanism of the communication at play (see Figures 3/4). As an aside, when assessing the direction of information flow, the authors also found delays between pairs of signals peaking at 176ms, far beyond what would be expected for direct communication between nodes. Discussing this aspect might also be of importance as it raises the possibility of third-party involvement.

      The local encoding of RPE in the vmPFC and lOFC is observed in a time interval ranging from approximately 0.2-0.4s to 1.2-1.4s after outcome presentation (blue bars in Fig. 2A). The encoding of RPE by interaction information covers a time interval from approximately 1.1s to 1.5s (blue bars in Fig. 3B, bottom right panel). Similarly, significant TE modulations between the vmPFC and lOFC specific for PPE occur mainly in the 0.7s-1.1s range. Thus, it seems that the local encoding of PPE precedes the effects observed at the level of the neural interactions (II and TE). On the other hand, the modulations in MI, II and TE related to PPE co-occur in a time window from 0.2s to 0.7s after outcome presentation. Thus, we agree with the reviewer that a generic conclusion about the potential mechanisms relating the three levels of analysis cannot be drawn. We thus replaced the term “emerge from” by “occur with” from the manuscript which may be misinterpreted as hinting at a potential mechanism. We nevertheless concluded that the three levels of analysis (and phenomena) co-occur in time, thus hinting at a potential across-scales interaction that needs further study. Indeed, our study suggests that further work, beyond the scope of the current study, is required to better understand the interaction between scales.

      Regarding the delay for the conditioning of the transfer entropy, the value of 176 ms reflects the delay at which we observed a maximum of transfer entropy. However, we did not use a single delay for conditioning, we used every possible delay between [116, 236] ms, as explained in the Method section. We would like to stress that transfer entropy is a directed metric of functional connectivity, and it can only be interpreted as quantifying statistical causality defined in terms of predictacìbility according to the Wiener-Granger principle, as detailed in the methods. Thus, it cannot be interpreted in Pearl’s causal terms and as indexing any type of direct communication between nodes. This is a known limitation of the method, which has been stressed in past literature and that we believe does not need to be addressed here.

      To account for this, we revised the discussion to make sure this issue is addressed in the following paragraph:

      “Here, we quantified directional relationships between regions using the transfer entropy (Schreiber, 2000), which is a functional connectivity measure based on the Granger-Wiener causality principle. Tract tracing studies in the macaque have revealed strong interconnections between the lOFC and vmPFC in the macaque (Carmichael and Price, 1996; Öngür and Price, 2000). In humans, cortico-cortical anatomical connections have mainly been investigated using diffusion magnetic resonance imaging (dMRI). Several studies found strong probabilities of structural connectivity between the anterior insula with the orbitofrontal cortex and dorsolateral part of the prefrontal cortex (Cloutman et al., 2012; Ghaziri et al., 2017), and between the lOFC and vmPFC (Heather Hsu et al., 2020). In addition, the statistical dependency (e.g. coherence) between the LFP of distant areas could be potentially explained by direct anatomical connections (Schneider et al., 2021; Vinck et al., 2023). Taken together, the existence of an information transfer might rely on both direct or indirect structural connectivity. However, here we also reported differences of TE between rewarding and punishing trials given the same backbone anatomical connectivity (Fig. 4). [...] “

      Reviewer #3:

      Summary:

      The authors investigated that learning processes relied on distinct reward or punishment outcomes in probabilistic instrumental learning tasks were involved in functional interactions of two different cortico-cortical gamma-band modulations, suggesting that learning signals like reward or punishment prediction errors can be processed by two dominated interactions, such as areas lOFC-vmPFC and areas aINS-dlPFC, and later on integrated together in support of switching conditions between reward and punishment learning. By performing the well-known analyses of mutual information, interaction information, and transfer entropy, the conclusion was accomplished by identifying directional task information flow between redundancy-dominated and synergy-dominated interactions. Also, this integral concept provided a unifying view to explain how functional distributed reward and/or punishment information were segregated and integrated across cortical areas.

      Strengths:

      The dataset used in this manuscript may come from previously published works (Gueguen et al., 2021) or from the same grant project due to the methods. Previous works have shown strong evidence about why gamma-band activities and those 4 areas are important. For further analyses, the current manuscript moved the ideas forward to examine how reward/punishment information transfer between recorded areas corresponding to the task conditions. The standard measurements such mutual information, interaction information, and transfer entropy showed time-series activities in the millisecond level and allowed us to learn the directional information flow during a certain window. In addition, the diagram in Figure 6 summarized the results and proposed an integral concept with functional heterogeneities in cortical areas. These findings in this manuscript will support the ideas from human fMRI studies and add a new insight to electrophysiological studies with the non-human primates.

      We thank the reviewer for the summary such as for highlighting the strengths. Please find below our answers regarding the weaknesses of the manuscript.

      Weaknesses:

      After reading through the manuscript, the term "non-selective" in the abstract confused me and I did not actually know what it meant and how it fits the conclusion. If I learned the methods correctly, the 4 areas were studied in this manuscript because of their selective responses to the RPE and PPE signals (Figure 2). The redundancy- and synergy-dominated subsystems indicated that two areas shared similar and complementary information, respectively, due to the negative and positive value of interaction information (Page 6). For me, it doesn't mean they are "non-selective", especially in redundancy-dominated subsystem. I may miss something about how you calculate the mutual information or interaction information. Could you elaborate this and explain what the "non-selective" means?

      In the study performed by Gueguen et al. in 2021, the authors used a general linear model (GLM) to link the gamma activity to both the reward and punishment prediction errors and they looked for differences between the two conditions. Here, we reproduced this analysis except that we used measures from the information theory (mutual information) that were able to capture linear and non-linear relationships (although monotonic) between the gamma activity and the prediction errors. The clusters we reported reflect significant encoding of either the RPE and/or the PPE. From Fig. 2, it can be seen that the four regions have a gamma activity that is modulated according to both reward and punishment PE. We used the term “non-selective”, because the regions did not encode either one or the other, but various proportions of bipolar derivations encoding either one or both of them.

      The directional information flows identified in this manuscript were evidenced by the recording contacts of iEEG with levels of concurrent neural activities to the task conditions. However, are the conclusions well supported by the anatomical connections? Is it possible that the information was transferred to the target via another area? These questions may remain to be elucidated by using other approaches or animal models. It would be great to point this out here for further investigation.

      We thank the reviewer for this interesting question. We added the following paragraph to the discussion to clarify the current limitations of the transfer entropy and the link with anatomical connections :

      “Here, we quantified directional relationships between regions using the transfer entropy (Schreiber, 2000), which is a functional connectivity measure based on the Granger-Wiener causality principle. Tract tracing studies in the macaque have revealed strong interconnections between the lOFC and vmPFC in the macaque (Carmichael and Price, 1996; Öngür and Price, 2000). In humans, cortico-cortical anatomical connections have mainly been investigated using diffusion magnetic resonance imaging (dMRI). Several studies found strong probabilities of structural connectivity between the anterior insula with the orbitofrontal cortex and dorsolateral part of the prefrontal cortex (Cloutman et al., 2012; Ghaziri et al., 2017), and between the lOFC and vmPFC (Heather Hsu et al., 2020). In addition, the statistical dependency (e.g. coherence) between the LFP of distant areas could be potentially explained by direct anatomical connections (Schneider et al., 2021). Taken together, the existence of an information transfer might rely on both direct or indirect structural connectivity. However, here we also reported differences of TE between rewarding and punishing trials given the same backbone anatomical connectivity (Fig. 4). Our results are further supported by a recent study involving drug-resistant epileptic patients with resected insula who showed poorer performance than healthy controls in case of risky loss compared to risky gains (Von Siebenthal et al., 2017).”

      References

      Carmichael ST, Price J. 1996. Connectional networks within the orbital and medial prefrontal cortex of macaque monkeys. J Comp Neurol 371:179–207.

      Cloutman LL, Binney RJ, Drakesmith M, Parker GJM, Lambon Ralph MA. 2012. The variation of function across the human insula mirrors its patterns of structural connectivity: Evidence from in vivo probabilistic tractography. NeuroImage 59:3514–3521. oi:10.1016/j.neuroimage.2011.11.016

      Combrisson E, Allegra M, Basanisi R, Ince RAA, Giordano BL, Bastin J, Brovelli A. 2022. Group-level inference of information-based measures for the analyses of cognitive brain networks from neurophysiological data. NeuroImage 258:119347. doi:10.1016/j.neuroimage.2022.119347

      Ghaziri J, Tucholka A, Girard G, Houde J-C, Boucher O, Gilbert G, Descoteaux M, Lippé S, Rainville P, Nguyen DK. 2017. The Corticocortical Structural Connectivity of the Human Insula. Cereb Cortex 27:1216–1228. doi:10.1093/cercor/bhv308

      Gueguen MCM, Lopez-Persem A, Billeke P, Lachaux J-P, Rheims S, Kahane P, Minotti L, David O, Pessiglione M, Bastin J. 2021. Anatomical dissociation of intracerebral signals for reward and punishment prediction errors in humans. Nat Commun 12:3344. doi:10.1038/s41467-021-23704-w

      Heather Hsu C-C, Rolls ET, Huang C-C, Chong ST, Zac Lo C-Y, Feng J, Lin C-P. 2020. Connections of the Human Orbitofrontal Cortex and Inferior Frontal Gyrus. Cereb Cortex 30:5830–5843. doi:10.1093/cercor/bhaa160

      Lachaux J-P, Fonlupt P, Kahane P, Minotti L, Hoffmann D, Bertrand O, Baciu M. 2007. Relationship between task-related gamma oscillations and BOLD signal: new insights from combined fMRI and intracranial EEG. Hum Brain Mapp 28:1368–1375. doi:10.1002/hbm.20352

      Mukamel R, Gelbard H, Arieli A, Hasson U, Fried I, Malach R. 2004. Coupling Between Neuronal Firing, Field Potentials, and fMRI in Human Auditory Cortex. Cereb Cortex 14:881.

      Niessing J, Ebisch B, Schmidt KE, Niessing M, Singer W, Galuske RA. 2005. Hemodynamic signals correlate tightly with synchronized gamma oscillations. science 309:948–951.

      Nir Y, Fisch L, Mukamel R, Gelbard-Sagiv H, Arieli A, Fried I, Malach R. 2007. Coupling between neuronal firing rate, gamma LFP, and BOLD fMRI is related to interneuronal correlations. Curr Biol 17:1275–1285.

      Öngür D, Price JL. 2000. The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cereb Cortex 10:206–219.

      Schneider M, Broggini AC, Dann B, Tzanou A, Uran C, Sheshadri S, Scherberger H, Vinck M. 2021. A mechanism for inter-areal coherence through communication based on connectivity and oscillatory power. Neuron 109:4050-4067.e12. doi:10.1016/j.neuron.2021.09.037

      Schreiber T. 2000. Measuring information transfer. Phys Rev Lett 85:461.

      Von Siebenthal Z, Boucher O, Rouleau I, Lassonde M, Lepore F, Nguyen DK. 2017. Decision-making impairments following insular and medial temporal lobe resection for drug-resistant epilepsy. Soc Cogn Affect Neurosci 12:128–137. doi:10.1093/scan/nsw152

      Recommendations for the authors

      Reviewer #1

      (1) Overall, the writing of the manuscript is dense and makes it hard to follow the scientific logic and appreciate the key findings of the manuscript. I believe the manuscript would be accessible to a broader audience if the authors improved the writing and provided greater detail for their scientific questions, choice of analysis, and an explanation of their results in simpler terms.

      We extensively modified the introduction to better describe the rationale and research question.

      (2) In the introduction the authors state "we hypothesized that reward and punishment learning arise from complementary neural interactions between frontal cortex regions". This stated hypothesis arrives rather abruptly after a summary of the literature given that the literature summary does not directly inform their stated hypothesis. Put differently, the authors should explicitly state what the contradictions and/or gaps in the literature are, and what specific combinations of findings guide them to their hypothesis. When the authors state their hypothesis the reader is still left asking: why are the authors focusing on the frontal regions? What do the authors mean by complementary interactions? What specific evidence or contradiction in the literature led them to hypothesize that complementary interactions between frontal regions underlie reward and punishment learning?

      We extensively modified the introduction and provided a clearer description of the brain circuits involved and the rationale for searching redundant and synergistic interactions between areas.

      (3) Related to the above point: when the authors subsequently state "we tested whether redundancy- or synergy dominated interactions allow the emergence of collective brain networks differentially supporting reward and punishment learning", the Introduction (up to the point of this sentence) has not been written to explain the synergy vs. redundancy framework in the literature and how this framework comes into play to inform the authors' hypothesis on reward and punishment learning.

      We extensively modified the introduction and provided a clearer description of redundant and synergistic interactions between areas.

      (4) The explanation of redundancy vs synergy dominated brain networks itself is written densely and hard to follow. Furthermore, how this framework informs the question on the neural substrates of reward versus punishment learning is unclear. The authors should provide more precise statements on how and why redundancy vs. synergy comes into play in reward and punishment learning. Put differently, this redundancy vs. synergy framework is key for understanding the manuscript and the introduction is not written clearly enough to explain the framework and how it informs the authors' hypothesis and research questions on the neural substrates of reward vs. punishment learning.

      Same as above

      (5) While the choice of these four brain regions in context of reward and punishment learning does makes sense, the authors do not outline a clear scientific justification as to why these regions were selected in relation to their question.

      Same as above

      (6) Could the authors explain why they used gamma band power (as opposed to or in addition to the lower frequency bands) to investigate MI. Relatedly, when the authors introduce MI analysis, it would be helpful to briefly explain what this analysis measures and why it is relevant to address the question they are asking.

      Please see our answer to the first public comment. We added a paragraph to the discussion section to justify our choice of focusing on the gamma band only. We added the following sentence to the result section to justify our choice for using mutual-information:

      The MI allowed us to detect both linear and non-linear relationships between the gamma activity and the PE

      An extended explanation justifying our choice for the MI was already present in the method section.

      (7) The authors state that "all regions displayed a local "probabilistic" encoding of prediction errors with temporal dynamics peaking around 500 ms after outcome presentation". It would be helpful for the reader if the authors spelled out what they mean by probabilistic in this context as the term can be interpreted in many different ways.

      We agree with the reviewer that the term “probabilistic” can be interpreted in different ways. In the revised manuscript we changed “probabilistic” for “mixed”.

      (8) The authors should include a brief description of how they compute RPE and PPE in the beginning of the relevant results section.

      The explanation of how we estimated the PE is already present in the result section: “We estimated trial-wise prediction errors by fitting a Q-learning model to behavioral data. Fitting the model consisted in adjusting the constant parameters to maximize the likelihood of observed choices etc.”

      (9) It is unclear from the Methods whether the authors have taken any measures to address the likely difference in the number of electrodes across subjects. For example, it is likely that some subjects have 10 electrodes in vmPFC while others may have 20. In group analyses, if the data is simply averaged across all electrodes then each subject contributes a different number of data points to the analysis. Hence, a subject with more electrodes can bias the group average. A starting point would be to state the variation in number of electrodes across subjects per brain region. If this variation is rather small, then simple averaging across electrodes might be justified. If the variation is large then one idea would be to average data across electrodes within subjects prior to taking the group average or use a resampling approach where the minimum number of electrodes per brain area is subsampled.

      We addressed this point in our public answers. As a reminder, the new version of the manuscript contains a figure showing the number of unique patients per region, the PE at per participant level together with local-encoding at the single participant level.

      (10) One thing to consider is whether the reward and punishment in the task is symmetrical in valence. While 1$ increase and 1$ decrease is equivalent in magnitude, the psychological effect of the positive (vs. the negative) outcome may still be asymmetrical and the direction and magnitude of this asymmetry can vary across individuals. For instance, some subjects may be more sensitive to the reward (over punishment) while others are more sensitive to the punishment (over reward). In this scenario, it is possible that the differentiation observed in PPE versus RPE signals may arise from such psychological asymmetry rather than the intrinsic differences in how certain brain regions (and their interactions) may encode for reward vs punishment. Perhaps the authors can comment on this possibility, and/or conduct more in depth behavioral analysis to determine if certain subjects adjust their choice behavior faster in response to reward vs. punishment contexts.

      While it could be possible that individuals display different sensitivities vis-à-vis positive and negative prediction errors (and, indeed, a vast body of human reinforcement learning literature seems to point in this direction; Palminteri & Lebreton, 2022), it is unclear to us how such differences would explain into the recruitment of anatomically distinct areas reward and punishment prediction errors. It is important to note here that our design partially orthogonalized positive and reward vs. negative and punishment PEs, because the neutral outcome can generate both positive and negative prediction errors, as a function of the learning context (reward-seeking and punishment avoidance). Back to the main question, for instance, Lefebvre et al (2017) investigated with fMRI the neural correlates of reward prediction errors only and found that inter-individual differences in learning rates for positive and negative prediction errors correlated with differences in the degree of striatal activation and not with the recruitment of different areas. To sum up, while we acknowledge that individuals may display different sensitivity to prediction errors (and reward magnitudes), we believe that such differences should translated in difference in the degree of activation of a given system (the reward systems vs the punishment one) rather than difference in neural system recruitment

      (11) As summarized in Fig 6, the authors show that information transfer between aINS to dlPFC was PPE specific whereas the information transfer between vmPFC to lOFC was RPE specific. What is unclear is if these findings arise as an inevitable statistical byproduct of the fact that aINS has high PPE-specificity and that vmPFC has high RPE-specificity. In other words, it is possible that the analysis in Fig 3,4 are sensitive to fact that there is a larger proportion of electrodes with either PPE or RPE sensitivity in aINS and vmPFC respectively - and as such, the II analysis might reflect the dominant local encoding properties above and beyond reflecting the interactions between regions per se. Simply put, could the analysis in Fig 3B turn out in any other way given that there are more PPE specific electrodes in aINS and more RPE specific electrodes in vmPFC? Some options to address this question would be to limit the electrodes included in the analyses (in Fig 3B for example) so that each region has the same number of PPE and RPE specific electrodes included.

      Please see the simulation we added to the revised manuscript (Fig. S10) demonstrating that synergistic interactions can emerge between regions with the same specificity.

      Regarding the possibility that Fig. 3 and 4 are sensitive to the number of bipolar derivations being R/PPE specific, a counter-example is the vmPFC. The vmPFC has a few recordings specific to punishment (Fig. 2) in almost 30% of the subjects (Fig. S4). However, there is no II about the PPE between recordings of the vmPFC (Fig. 3). The same reasoning also holds for the lOFC. Therefore, the proportion of recordings being RPE or PPE-specific is not sufficient to determine the type of interactions.

      (12)  Related to the point above, what would the results presented in Fig 3A (and 3B) look like if the authors ran the analyses on RPE specific and PPE specific electrodes only. Is the vmPFC-vmPFC RPE effect in Fig 3A arising simply due to the high prevalence of RPE specific electrodes in vmPFC (as shown in Fig. 2)?

      Please see our answer above.

      Reviewer #2:

      Regarding Figure 2A, the authors argued that their findings "globally reproduced their previously published findings" (from Gueguen et al, 2021). It is worth noting though that in their original analysis, both aINS and lOFC show differential effects (aINS showing greater punishment compared to reward, and the opposite for lOFC) compared to the current analysis. Although I would be akin to believe that the nonlinear approach used here might explain part of the differences (as the authors discussed), I am very wary of the other argument advanced: "the removal of iEEG sites contaminated with pathological activity". This raised some red flags. Does that mean some of the conclusions observed in Gueguen et al (2021) are only the result of noise contamination, and therefore should be disregarded? The author might want to add a short supplementary figure using the same approach as in Gueguen (2021) but using the subset of contacts used here to comfort potential readers of the validity of their previous manuscript.

      We appreciate the reviewer's concerns and understand the request for additional information. However, we would like to point out that the figure suggested by the reviewer is already present in the supplementary files of Gueguen et al. 2021 (see Fig. S2). The results of this study should not be disregarded, as the supplementary figure reproduces the results of the main text after excluding sites with pathological activity. Including or excluding sites contaminated with epileptic activity does not have a significant impact on the results, as analyses are performed at each time-stamp and across trials, and epileptic spikes are never aligned in time across trials.

      That being said, there are some methodological differences between the two studies. To extract gamma power, Gueguen et al. filtered and averaged 10 Hz sub-bands, while we used multi-tapers. Additionally, they used a temporal smoothing of 250 ms, while we used less smoothing. However, as explained in the main text, we used information-theoretical approaches to capture the statistical dependencies between gamma power and PE. Despite divergent methodologies, we obtained almost identical results.

      The data and code supporting this manuscript should be made available. If raw data cannot be shared for ethical reasons, single-trial gamma activities should at least be provided. Regarding the code used to process the data, sharing it could increase the appeal (and use) of the methods applied.

      We thank the reviewer for this suggestion. We added a section entitled “Code and data availability” and gave links to the scripts, notebooks and preprocessed data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate the recommendations of the reviewers and have performed further analyses with existing data where requested. 

      Below are our responses to each of the individual points. 

      Reviewer #1 (Recommendations For The Authors):

      (1) P11 mouse retina is still quite young, would MG isolated from adult retina be more interesting and relevant to disease-oriented cell replacement therapy? How efficiently would the sci-Plex system work for in vitro screen of mature murine MG?

      Thank you for bringing this up. While a protocol for the conversion of MG to neurons with adult mice in vivo exists, it has proven to be more difficult to maintain adult MG in dissociated cell cultures, due to their more limited proliferation in vitro. This makes it difficult to use the sci-Plex assay, since cell number is limiting for treatment conditions. Therefore, we have chosen the strategy of screening on P11, where MG undergo proliferative cell divisions in dissociated cultures, allowing us to grow the millions of cells needed for this assay, and then to test the efficacy of the compounds we find from the screen with an adult in vivo assay.

      (2) The study identified and tested the compounds individually, how would a combination of the compounds work in vivo? It would be interesting to examine how different combinations may affect the reprogramming efficiency and neuronal compositions.

      We agree that this would be very interesting to investigate.  However, the number of treatment conditions then expands beyond the scale of the current sci-Plex technology with the number of MG that we are able to collect.  We instead adopted the strategy of casting a very wide net to identify additional molecular pathways that might be important in the reprogramming process.

      (3) In-depth mechanistic and/or functional studies of the reprogrammed MG are highly desirable to improve the quality and significance of the study and to better understand how the compounds may influence the signaling and the reprogramming process.

      While we agree that this would strengthen the study, this would increase the scope of the required revisions considerably. We are very interested in following up on some of the hits and look forward to providing additional details of mechanisms in future publications.  However, we feel that reporting this method and the results will stimulate those interested in reprogramming glia in other areas of the nervous system to test the compounds we identified in this assay.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors employed two protocols to initiate direct reprogramming of MG into retinal neurons in vitro. These protocols, referred to as "Timecourse" and "Pulse," involved short-term treatments lasting no more than 5 days. However, the findings obtained indicate that these brief treatments were insufficient to achieve a stable conversion. This conclusion is supported by the comparison between the "4 days (Timecourse)" and "4 days (Pulse)" conditions, as depicted in Figure 1 (D and E). In this set of experiments, labeling cells that express specific neuronal markers as neurons raises concerns, as these cells may have multiple fates, either died, reverted, arrested in certain intermediate stages, or converted to functional neurons. It is thus critical to determine whether the conversion to functional neurons is enhanced.

      We thank you for your concern about this. We aimed to be very careful in our naming. In our naming scheme for this figure, we only consider the small number of cells with specific Bipolar markers (Trpm1, Grm6, Capb5, Otx2) neurons based on previous publications ((Jorstad et al. 2017; Todd et al. 2021; Todd et al. 2022; Todd et al. 2020)). The other cells that have some neuronal markers are identified as neuronal precursors (NeuPre) and are, as you mentioned, not necessarily mature/functional. While these NeuPre cells may eventually have multiple fates/may die/may revert to more ProL cells at some rate we believe it’s fair to define them as Neuronal Precursors due to the genes they are expressing (Dcx, Snap25, Elavl3, Gap43) at the moment of collection.  

      Furthermore, your statement indicating that “the findings obtained indicate that these brief treatments were insufficient to achieve a stable conversion” is not what we intended to demonstrate. The text will be reworked to reflect what we hoped to convey. We acknowledge that 1) the majority cells are not stably converted, and 2) the levels of NeuPre cells are lower in the Pulse experiment overall, but this is true even at Day 5 when the conditions should be the same across experiments. The Pulse and Timecourse experiments were done on different days, and having previously found that there are differences in MG to BP conversion rate from experiment to experiment, these results were not unexpected. Of more note to us was that while ProL cells, Transition cells, and MG have very different patterns of abundance across time when comparing the experiments, the NeuPre cells accumulate at a similar time and pattern across the two experiments. This indicated to us that they uniquely have some amount of Ascl1 independent stability in their cell fate even when exposed to Ascl1 for as little as 3 days. See Author response image 1 below. This plot will be added to Fig. S1.

      Author response image 1.

      (2) The authors made a claim that a pseudo time value of 15 represents a crucial timepoint where the transition in cell fate becomes stable and ceases to rely on ectopic Ascl1 expression. However, it is essential to provide concrete evidence to substantiate this assertion. It is prudent to perform quantitative analyses rather than relying solely on the deduced trajectory to make this claim.

      This is a fair point, the value of 15 was estimated by eye. We have returned to the data and estimated a density function for the pseudotime scores of the cells from the 1, 2, 3, and 4 day conditions in both the Pulse and Timecourse experiments (Author response image 2A-B below). We then calculated 16 to be the local minima between the pseudotime values of 10-20 for the Pulse experiment (Blue line). When comparing the two experiments, it’s apparent that there is a massive accumulation of cells with a pseudotime value just lower than 16 in the Timecouse experiment (values 10-15), and very few cells across the same region for the Pulse experiment, indicating some dependence on continued Ascl1 expression for the cell fate that exists from pseudotime 10-16 (mostly ProL cells). To the contrary, cells with greater pseudotime values exist across both experiments at similar levels.

      We have also looked at the expression of Ascl1 along the pseudotime trajectory in the Timecourse experiment. Interestingly, and consistent with experiments in previous studies, both in vitro and in vivo (Todd et al. 2021; Todd et al. 2022; Todd et al. 2020), we see a decrease in Ascl1 expression as the cells move towards the end of the pseudotime trajectory (C below). It’s intriguing to us that the downregulation also happens right after a pseudotime value of 16. The temporal coalescence of the loss of Ascl1 expression in the Timecourse experiment with the persistence of cells with pseudotime values > 16 in the Pulse experiment provides strong evidence that we have identified the point at which cells stop expressing Ascl1 while maintaining more mature cell fates. The plots below will be added to the manuscript.

      Author response image 2.

      (3) It is intriguing to observe that the expression of Ascl1 was down-regulated in both neuronal precursors and bipolar cells in the mouse retina following tamoxifen and NMDA treatment (refer to Fig. 3C). However, the expression of ectopical Ascl1 should have been constitutively activated by tamoxifen. Therefore, if the GFP+ bipolar cells and neuronal precursors were indeed converted from Müller cells, we would expect to capture a high level of Ascl1 expression. How to account for this discrepancy? How is the expression exogenous Ascl1 expressed from a constitutive promoter attenuated?

      As discussed above, this has been observed previously. Ascl1 driven from the TTA transgenic mouse line is high in the MG, but declines as these cells are reprogrammed into neurons in vivo or in vitro.  One possibility is that the TTA is not as active in neurons as in MG, but in other lines of transgenic mice, eg. TRE-Atoh1 mice, the transgene continues to be expressed at a high level even in the differentiating neurons, so this downregulation appears to be unique to Ascl1.  We do not understand why Ascl1 levels decline in the differentiating neurons, but this has been a consistent finding across several studies of in vivo and in vitro reprogramming.

      (4) Exogenous Ascl1 was shut down after other neuronal specific genes were induced during MG reprogramming in vitro. Is this also the case during Ascl1-mediated reprogramming in vivo? If so, do converting cells show a distinct gene expression program if exogenous Ascl1 is constitutively overexpressed?

      Yes, as can be seen in Fig 3C Ascl1 expression is high in the MG and Transition cell populations, but decreases in the NeuPre and Bipolar cells. As stated above, continued high Ascl1 expression keeps cells in a more progenitor-like state. This is true in vivo and in vitro. It has been more clearly addressed upon revision.  

      (5) As previously documented in their Science Advances publication, the authors have established the requirement of NMDA injury for facilitating the successful induction of neuronal conversion through Ascl1 over-expression. Why is injury required for MG conversion in vivo, but not in vitro? This is related to question #1 above that certain signals may be required for the full conversion process, not just the initial induction of a few neuronal specific genes.

      While the in vitro and in vivo systems share similarities, there are key differences, which affect what must be done to the cells in order to produce converted neurons. In our initial publication demonstrating that Ascl1 can reprogram mouse MG to a neurogenic state, we carried out our experiments in dissociated cell cultures (Pollak et al 2013) like those described in this report.  At that time, we did not need to add either NMDA or TSA to the cultures to induce neurogenesis from Ascl1.  However, when we attempted the reprogramming in vivo, we found that after postnatal day 8, injury and TSA were required in vivo (Ueki et al; Jorstad et al). We surmise that the massive neuronal loss that occurs in establishing dissociated MG cultures replaces the NMDA injury we carry out in vivo.   

      To your second point about the requirement for more than “just the initial induction of a few neuronal specific genes”. This is definitely true. When we carry out reprogramming in vivo with Ascl1 or other transcription factors, the MG-derived neurons acquire neuronal morphology, develop neuron-like electrophysiological properties, integrate into the retinal circuit and respond to light stimulus; however, they are still not identical in gene expression or morphology to normal retinal neurons. This  is why we are continuously looking for more compounds or conditions that can help improve the process.

      (6) The discovery that Metformin acts as a stimulator for MG-to-neuron conversion is interesting.

      However, before drawing definitive conclusions, several questions need to be addressed:

      (a) As specific small molecules have been identified to change cell fates, the question is whether Metformin and other effective compounds can function alone or have to effect in conjunction with Ascl1? This can and should be tested in vitro by simply treating MG with Metformin but not doxycycline.

      To our knowledge there are no convincing in vivo trials in which neurons have been generated from MG using only combinations of small molecules. Because Metformin was identified in vitro due to the increase in recovered cells and not an increase in % neurons, we especially doubt it would have the desired increase in neurons without expression of a transcription factor.  

      (b) Metformin is known to target AMPK, but this is unlikely the only target of the drug. Does AMPK knockdown have the same enhancement effect?

      In the drug screen, we also tested the AMPK inhibitor Dorsomorphin dihydrochloride, but it didn’t have any effect. However, Metformin is an activator, so it would be interesting to see in future studies if Dorsomorphin dihydrochloride could inhibit the effect of Metformin or if the enhancement is acting independently.  

      (c) Is the effect of Metformin specific for Ascl1 or any TF(s) that stimulates MG-to-neuron conversion?

      We would like to follow up with this in future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      Using concurrent in vivo whole-cell patch clamp and dendritic calcium imaging, the authors characterized how functional synaptic inputs across dendritic arborizations of mouse primary visual cortex layer 2/3 neurons emerge during the second postnatal week. They were able to identify spatially and functionally separated domains of clustered synapses in these neurons even before eye-opening and characterize how the clustering changes from P8 to P13. 

      Strengths: 

      The work is technically challenging and the findings are novel. The results support previous EM and immunostaining studies but provide in vivo evidence on the time course and the trajectory of how functional synaptic input develops. 

      Weaknesses: 

      There are some missing details about how the experiments were performed, and I also have some questions about the analyses. 

      We have now added a more detailed description of the methods and added new supplemental figures and descriptions to clarify our analyses. Please find our responses to the specific points of this reviewer in the section “Recommendations for the authors” below.

      Reviewer #2 (Public Review):

      In this study, Leighton et al performed remarkable experiments by combining in-vivo patch-clamp recording with two-photon dendritic Ca2+ imaging. The voltage-clamp mode is a major improvement over the pioneer versions of this combinatorial experiment that has led to major breakthroughs in the neuroscience field for visualizing and understanding synaptic input activities in single cells in-vivo (sharp electrodes: Svoboda et al, Nature 1997, Helmchen et al, Nature Neurosci 1999; whole-cell current-clamp: Jia et al, Nature 2010, Chen et al, Nature 2011. I suggest that these papers would be cited). This is because in voltage-clamp mode, despite the full control of membrane voltage in-vivo not being realistic, is nevertheless most effective in preventing back-propagation action potentials, which would severely confound the measurement of individual synaptically-induced Ca2+ influx events. Furthermore, clamping the cell body at a strongly depolarized potential (here the authors did -30mV) also facilitates the detection of synaptically-induced Ca2+ influx. As a result, the authors successfully recorded high-quality Ca2+ imaging data that can be used for precise analysis. To date, even in view of the rapid progress of voltage-sensitive indicators and relevant imaging technologies in recent years, this very old 'art' of combining single-cell electrophysiology and two-photon imaging (ordinary, raster-scanned, video-rate imaging) of Ca2+ signals still enables measurements of the best level precision. 

      We thank the reviewer for reminding us of these important previous studies that we cite now in the revised manuscript. 

      On the other hand, the interpretation of data in this study is a bit narrow-minded and lacks a comprehensive picture. Some suggestions to improve the manuscript are as follows: 

      (1) The authors made a segregation of 'spine synapse' and 'shaft synapse' based solely on the two photon images in-vivo. However, caution shall be taken here, because the optical resolution under in vivo imaging conditions like this cannot reliably tell apart whether a bright spot within or partially overlapping a segment of the dendrite is a spine on top of (or below) it. Therefore, what the authors consider as a 'shaft synapse' (by detecting Ca2+ hotspots) has an unknown probability of being just a spine on top or below the dendrite. If there is other imaging data of higher axial resolution to validate or calibrate, the authors shall take some further considerations or analysis to check the consistency of their data, as the authors do need such a segregation between spine and shaft synapses to show how they evolve over the brain development stages. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane. We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      (2) The use of terminology 'bursts of spontaneous inputs' for describing voltage-clamp data seems improper. Conventionally, 'burst' refers to suprathreshold spike firing events, but here, the authors use 'burst' to refer to inward synaptic currents collected at the cell body. Not every excitatory synaptic input (or ensemble of inputs) activation will lead to spike firing under naturalistic conditions, therefore, these two concepts are not equivalent. It is recommended to use 'barrage of inputs' instead of 'burst of inputs'. Imagine a full picture of the entire dendritic tree, the fact that the authors could always capture spontaneous Ca2+ events here and there within a few pieces of dendrites within an arbitrary field-of-view suggests that, the whole dendritic tree must have many more such events going on as a barrage while the author's patch electrode picks up the summed current flow from the whole dendritic tree. 

      We agree with the reviewer that “barrage” is a clearer term for multiple synaptic inputs occurring simultaneously and therefore we changed the terminology throughout the manuscript.

      (3) Following the above issue, an analysis of the temporal correlation between synaptic (not segregating 'spine' or 'shaft') Ca2+ events and EPSCs is absent. Again, the authors drew arbitrary time windows to clump the events for statistical analysis. However, the demonstrated example data already shows that the onset times of individual synaptic Ca2+ events do not necessarily align with the beginning of a 'barrage' inward current event. 

      The reviewer writes that “an analysis of the temporal correlation between synaptic calcium events and EPSCs is absent”. We would like to point out that we did determine the percentage of calcium transients that occurred during barrages of synaptic inputs (~60%, page 7). This is important, since the barrages in our patch-clamp recordings most likely reflect spontaneous network events as described in the developing cortex previously by us and many other labs . The time window we chose was not “arbitrary” as the reviewer suggests, but based on the duration of the barrages of synaptic inputs as defined in the Methods section. 

      The reason, why we did not perform a more in-depth analysis of the temporal relationship between synaptic calcium transients and synaptic input currents is that it is essentially impossible to relate calcium transients at individual synapses to specific synaptic input events. First, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization as the reviewer notes above. Thus, barrages cannot be broken down into individual synaptic transmission events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording, several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording.

      (4) The authors claim that "these observations indicate that the activity patterns investigated here are not or only slightly affected by low-level anesthesia". It would be nice to show some of the recordings in this work without any anesthesia to support this claim. 

      Indeed, the conclusion that the patterns of activity are only slightly affected by low levels of anesthesia is based on our previous recordings on the network level. Unfortunately, we are still not able to record calcium imaging with single synapse resolution in unanesthezed developing mice (and no one else is as far as we know), because the skull of these young animals is not firm, yet. As a consequence, movements cannot be reduced sufficiently for patching and imaging with single synapse resolution. Our previously published (Siegel et al., 2012) and unpublished work on the cellular level suggests that activity patterns during light anesthesia are very similar to those during sleep in mouse pups at this age.

      Reviewer #3 (Public Review):

      Summary: 

      There is a growing body of litterature on the clustering of co-active synapses in adult mice, which has important implications for understanding dendritic integration and sensory processing more broadly. However, it has been unclear when this spatial organization of co-active synapses arises during development. In this manuscript, Leighton et al. investigate the emergence of spatially organized, coactive synapses on pyramidal dendrites in the mouse visual cortex before eye-opening. They find that some dendrite segments contain highly active synapses that are co-active with their neighbors as early as postnatal day (P) 8-10, and that these domains of co-active synapses increase their coverage of the dendritic arbor by P12-13. Interestingly, Leighton et al. demonstrate that synapses co-active with their neighbors are more likely to increase their activity across a single recording session, compared to synapses that are not co-active with their neighbors, suggesting local plasticity driven by coincident activity before eye-opening. 

      The current manuscript includes some replication of earlier results from the same research group (Winnubst et al., 2015), including the presence of clustered, co-active synapses in the visual cortex of mouse pups, and the finding that synapses co-active with their neighbors show an increase in transmission frequency during a recording session. The main novelty in the current study compared to Winnubst et al. (2015) is the inclusion of younger animals (P8-13 in the current study compared to P10-15 in Winnubst et al., 2015). The current manuscript is the first demonstration that active synapses are clustered on specific dendrite segments as early as P8-10 in the mouse visual cortex, and the first to show the progression in active synapse distribution along the dendrite during the 2nd postnatal week. These results from the visual cortex may help inform our understanding of sensory development more broadly. 

      Strengths: 

      The authors ask a novel question about the emergence of synaptic spatial organization, and they use well-chosen techniques that directly address their questions despite the challenging nature of these techniques. To capture both structural and functional information from dendrites simultaneously, the authors performed a whole-cell voltage clamp to record synaptic currents arriving at the soma while imaging calcium influx at individual synaptic sites on dendrites. The simultaneous voltage clamp and calcium imaging allowed the authors to isolate individual synaptic inputs without their occlusion by widespread calcium influx from back-propagating action potentials. Achieving in vivo dendrite imaging in live mice that are as young as P8 is challenging, and the resulting data provides a unique view of synaptic activity along individual dendrites in the visual cortex at an early stage in development that is otherwise difficult to assess. 

      The authors provide convincing evidence that synapses are more likely to be co-active with their neighbors compared to synapses located farther away (Fig. 6F-H), and that synapses co-active with their neighbors increase their transmission frequency during a recording session (Figure 7C). These findings are particularly interesting given that the recordings occur before eye-opening, suggesting a relationship between co-activity and local synaptic plasticity even before the onset of detailed visual input. These results replicate previously published findings from P10-15 pups (Winnubst et al., 2015), increasing confidence in the reproducibility of the data. 

      The authors also provide novel data documenting for the first time spatially organized, co-active synapses in pups as young as P8. Comparing the younger (P8-10) and older (P12-13) pups, provides insight into how clusters of co-active synapses might emerge during development. 

      Weaknesses: 

      This manuscript provides insufficient detail for assessing the rigor and reproducibility of the methods, particularly for age comparisons. The P8-10 vs P12-13 age comparisons are the primary novel finding in this manuscript, and it is, therefore, critical to avoid systematic age differences in the methods and analysis whenever possible. Specific concerns related to the age comparisons are listed below: 

      (1) Given that the same research group previously published P12-13 data (Winnubst et al., 2015), it is unclear whether both age groups in the current study were imaged/analyzed in parallel by the same researcher(s), or whether previous data was used for the P12-13 group. 

      While indeed the approach in the present study is similar to that of our previous study (Winnubst et al. 2015), the data set presented here is entirely new. The current study was made possible by a new microscope that allows combining resonant scanning with piezo-focusing to image large fractions of the dendric arborization. In fact, we could now image almost 10 times larger dendric segments including branch points than in our previous study. One author contributed to the experiments in both studies. Image analysis of all experiments was performed by the first author of the present study who was not involved in the Winnubst et al. work.

      (2) The authors mention that they used 2 different microscopes, and used a fairly wide range of imaging frame rates (5-15 Hz). It is unclear from the current manuscript whether the same imaging parameters were used across the two age groups. If data for the two experimental groups was collected separately, perhaps at different times, by a different person, or on a different microscope, there is a concern that some differences between the groups may not necessarily be due to age. 

      The reviewer mentions that the experimental settings are not identical across the experiments of this study. In the original manuscript we erroneously reported in the Methods section that 2 different setups were used for this study; however, all experiments were performed on the same microscope. We have corrected this in the new manuscript. We took timelapse recordings of small stacks of varying depth to cover as many dendrites as possible in each recording, therefore, we needed to adjust the rate of acquired stacks within a certain range as the reviewer points out. The data were acquired by two scientists during an overlapping period. And while the different ages were not recorded in a strictly randomized fashion, they were not acquired in sequence according to ages, but rather involved many attempts on animals of different ages from many different litters. For each litter a small percentage of animals would generate successful recordings, and the ages of these successes were random. Therefore, we believe that neither the collection of data nor the analysis (see point above) affected the differences we describe here for the two age groups.

      (3) It is unclear whether the image analysis was performed blind to age. Blinding to age during analysis is particularly important for this study, in which it was not possible to blind to age during imaging due to visible differences in size and developmental stage between younger and older pups. 

      The analysis was not setup to be performed blind to age. Not only is the age of the animal apparent at the stage (as the reviewer points out), also the number of spines and the activity levels clearly show differences between neurons only a few days apart. However, all age-related findings reported in this study - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it very unlikely that event detection was biased.

      (4) The relatively low N (where N is the number of dendrites or the number of mice) in this study is acceptable due to the challenging nature of the techniques used, but unintentional sampling bias is a concern. For example, if higher-order dendrites from the apical tuft were imaged at P12-13, while more segments of the apical trunk were imaged at P8-10, this could inadvertently create apparent age differences that were in fact due to dendrite location on the arbor or dendrite depth. 

      The reviewer points out that sampling bias with respect to synapse location along dendrites in the dataset could lead to falsely apparent age differences. In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we confirmed that the mean distance of the imaged dendric stretches from the cell body was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      Additional general methodological concerns, which are not specifically related to the age comparisons, are listed below: 

      (5) The authors assert that clustered, co-active synapses emerge in the visual cortex before eye-opening, which is an important finding in that it suggests this phenomention is driven by spontaneous activity rather than visual input. However, this finding hinges on the imaged cells being reliably located in the visual cortex, which is difficult to identify with certainty in animals that have not yet opened their eyes and therefore cannot undergo intrinsic signal imaging to demarcate the boundaries of the visual cortex. If the imaged cells were in, for example, nearby somatosensory cortex, then the observed spatial organization could be due to sensory input rather than spontaneous activity. 

      The reviewer argues that if the neurons included in our analysis were located in non-visual sensory cortex, e.g. the somatosensory cortex, sensory experience might have shaped clustered inputs instead of spontaneous activity. We are, however, certain that the neurons were located inside the primary visual cortex. In previous experiments where we performed the same craniotomies, we mapped spontaneous activity across the sensory areas in the occipital neocortex and we know the exact location of V1 which is already very consistent during the second postnatal week. (See for example Supplemental Figure 4 in Leighton et al., 2021).  

      (6) It is unclear how the authors defined a synaptic transmission event in the GCaMP signal (e.g. whether there was a quantitative deltaF/F threshold). 

      In the revised manuscript, we describe the procedure of identifying synaptic calcium transients in more detail and added a new supplemental figure to clarify this aspect of the analysis. In short, we use an automated detection with a 2x standard deviation threshold and a subsequent manual control and selection step. Please, find all details in the Methods section and Figure S4 of the revised manuscript.

      (7) The authors' division of synapses into spine vs shaft is unconvincing due to the difficulty of identifying Z-projecting spines in images from 2-photon microscopy, where the Z resolution is insufficient to definitively identify Z-projecting spines, and the fact that spines in young animals may be thin and dim. The authors' examples of spine synapses (e.g. in Fig. 2A) are convincing, but some of the putative shaft synapses may in fact be on spines. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane (see also response to Reviewer 2, point 1). We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      Reviewer #1 (Recommendations For The Authors):

      I think the experiments performed were very technically challenging (probably one of the few labs that can do this in the field), and the findings provide in vivo evidence on how structured synaptic inputs are assembled during development that has never been reported. 

      I suggest improving the writing and presentation and really explaining how they conducted the experiments and how they defined shaft synapses. 

      Line 96: 12 dendritic areas from 11 mice at ages between postnatal day 8 to 13. 

      - Do the authors know how many neurons were imaged? It is unclear if the authors patch on all the imaged neurons and only imaged (or analyzed) the dendrites of those patched neurons. If yes, how sparse are the neurons labelled from IUE? From 1B, it looks like there are two cells adjacent to each other. Can the authors really distinguish whether the imaged dendrites are from the patched neuron? 

      The reviewer wonders whether we can tell apart dendrites of patched cells from those of neighboring neurons that were not patched. This is actually very straight forward: the experiment included a depolarization step (see Methods section) which leads to an immediate, but temporary, increase in fluorescence in all of the patched neurons’ dendrites, but none of the neighboring dendrites. We have added this information to the Methods section of the new manuscript and provide now an example (Fig S3). Furthermore, as these cells normally fire frequently, it would immediately become clear that an unpatched cell is being imaged if backpropagating action potentials are predominantly observed rather than synaptic signals. The visualization of these synaptic signals is only possible due to the blockade of Na+ channels with QX314 in the intracellular solution (see Methods). 

      - In the methods section, it says 'dendrites were imaged in single plane or small stacks with plane...'. How do the authors do calcium imaging with small stacks of plane using Nikon MP scope? 

      Small stacks were acquired by using the piezo focusing device of our Nikon A1 microscope. Since we combined this fast focusing approach with resonant scanning, we were able to acquire z-stacks of 3-5 frames at a rate of up to 15 Hz (per stack).

      - I also assume this is not chronic imaging, and there are different mice for each postnatal day. If it's true, this is somewhat important for all the correlation analysis as there are only 2 mice for each postnatal day (other than day 12) and day 13 only has 1 animal. 

      Yes, indeed these are not chronic experiments and dendrites imaged on different days are from different neurons and different mice. We agree with the reviewer that if it had been possible to image the same neurons across these developmental stages, we would have detected even clearer correlations. Therefore, we see our results as conservative estimates of the developmental trajectory of the analyzed parameters.

      Line 104 - 109: I don't understand why the authors need to hold at -30mV to facilitate calcium influx through NMDA receptors? I assume this helps them to visualize as many synapses as possible? but wouldn't that also make the 'event frequency' not reflect the true value? 

      Indeed depolarizing the imaged neurons to -30 mV was necessary to get sufficient calcium influx to map synaptic inputs. We don’t think that this affects the frequency of inputs, because the frequency of synaptic inputs is determined by the presynaptic firing rate and the release probability of the presynaptic terminal, which are not affected by the depolarization of the dendrite.

      Figure 2A - It says in the method section that ROIs are manually selected. However, it's not explained what the criteria are. For spine synapses, it's easy to define but for shaft synapses like in Fig 2B, why are there 2 synapses on the shaft? And in Fig 4a, 5a, Fig S1 P13, some of the dendrites are packed with ROIs. What's the distance between those shaft synapses? Can the imaging resolution really separate them? 

      The reviewer asks for a better description of how we identified individual ROIs and thus synapse locations and whether this is actually feasible. We have now added a more detailed description of how we select synaptic sites based on the occurrence of synaptic calcium transients. In addition, we have added a new supplemental Figure (S4) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. We find that separating sha synapses was possible for inter-synapse distances of ~4 µm or more. The mean sha synapse distance in our data set is 21 µm.

      - Similar issue applies to Figure 4A that I'm not sure what's the resolution of each 'hot spot'. They all seem very close together. Maybe additional raw dendrite images with fluorescence changes like 1C or 2A could be helpful (or movies in the supplementary?) 

      As the reviewer suggests, we have added now additional supplemental figures to illustrate better how we identify synaptic transmission events as well as spine and sha synapses.

      - Also for line 164, it says that 76% of high-activity synapses were located on spines. This could also maybe support that only the spine synapses are real synapses and many shaft synapses are actually not synapses and they were just categorized as shaft synapses from manual ROI? 

      We are actually quite sure that sha synapses are real synapses based on our analysis, since they show repeated synaptic calcium transients that co-occur with barrages of synaptic inputs as measured by patch-clamp recordings. Indeed one would expect to see a number of excitatory synapses on dendric shas of pyramidal neurons at these ages based on previous EM studies (Miller and Peters, 1981; Wildenberg et al., 2023).

      - While this might not impact the overall novelty of the paper, I would be curious to know if the authors can still observe the same findings if they only analyze spine synapses. 

      We repeated several analyses with a dataset that contained only spine synapses. For most analyses we observed the expected result: the effect sizes were similar compared to the entire data set, but the power was reduced. For example the effect of distance to closest high-activity neighbor and own activity (Fig 5E, F) was similar, but p-values were around 0.1 (Similar results for Figure 7B). In contrast, the co-activity with synapses within a domain was significantly higher than the co-activity with synapses in other domains also for the spine-synapse only data set. 

      Fig 6 - Does the domain co-activity also contribute to the synaptic current recorded (related to Fig 4). 

      Yes, the synaptic activity measured by calcium imaging contributes to the recorded EPSCs. However, the exact relationship between synaptic inputs measured by calcium imaging and those measured by patch-clamping is complicated by 3 factors: first, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization. Thus, barrages cannot be broken down into individual events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording as EPSCs.

      Reviewer #2 (Recommendations For The Authors):

      (1) I suggest the authors should provide the number of cells and mice recorded in the figure legends. 

      The number of dendrites and mice is the same across all analyses: 12 dendrites from 11 mice for all experiments, 6/6 for P8-10 and 6/5 for P12-13. All dendrites and synapses (and their ages) are shown in the supplemental figures S1 and S2. We mention the number of imaged dendrites now at the beginning of the Results section and when we split ages for the first me.

      (2) Instead of showing only cartoon illustrations of dendrites in Figures 3-6, I suggest showing the two photon images as well together with the cartoon. 

      The 2-photon images of all dendrites of the dataset are available in Figure S1. To allow the reader to compare the cartoon representations in the main figures and the 2-photon images of each neuron, we have now labeled each dendrite in the dataset (D1-D12, see figures S1 and S2). For every figure, where we show example neurons (cartoons or zoom ins) we now provide this identifier.

      Reviewer #3 (Recommendations For The Authors):

      To address the weaknesses outlined above, we recommend that the authors do the following: 

      • To address concerns about the rigor and reproducibility of the methods specifically related to age comparisons, please confirm the following: 

      - Both age groups were run in parallel by the same researcher(s). 

      Experiments were run partly overlapping and experiments from different age groups were performed in parallel by both researchers.

      - Both age groups were imaged on the same microscope, or animals from each age group were imaged on both microscopes. If it was necessary to use different microscopes for the different age groups for biological or practical reasons, please explain. 

      All experiments were run on the same microscope, a Nikon A1 2-photon microscope. In the original methods description we erroneously mentioned two microscopes (copy and paste error from a previous publication). We corrected that in the revised manuscript.

      - There was no difference in imaging frame rates or other imaging parameters between age groups. If it was necessary to use different parameters for different age groups for biological reasons, please explain. 

      We varied the frame rates somewhat to allow larger z-stacks for some experiments where dendrites traversed different depths; however the mean frame rates were similar between the experiments in P8-10 vs P12-13 dendrites, 8.5 vs 10 Hz, respectively.

      - Images were analyzed blind to age. 

      The analysis was not setup to be performed blind to age. The number of spines and the activity levels clearly show obvious differences between neurons only a few days apart. However, all findings reported in this study related to age - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it unlikely that event detection was biased.

      - There was no difference in the location of analyzed dendrites (e.g. depth from the pia, branch order) between age groups. 

      In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we determined the mean distance of the imaged dendric stretches from the cell body and found that this distance was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      • To address general methodological concerns, please provide additional description of the following points: 

      - Please clarify how the visual cortex was identified in P8-13 pups. If there was ambiguity about identifying the visual cortex in these pups, please discuss the implications of this ambiguity. 

      The reviewer asks how we identified V1 in these experiments. We are indeed certain that the neurons were located inside the primary visual cortex. We have ample experience with mapping V1 in these animals based on patterns of spontaneous activity as well as post-hoc stainings. V1 is quite large already at these ages (> 2 mm long and > 1 mm wide) and its extent very consistent across animals. Thus, we would argue it is actually hard to miss.

      - Please clarify how synaptic transmission events were identified in the GCaMP signal. 

      We have now added a more detailed description of how we identify synaptic calcium transients. In addition, we have added a new supplemental Figure (S3) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. 

      - It is acceptable to use the spine vs shaft analysis despite the inevitable difficulty resolving Z-projecting spines, but this caveat should be mentioned in the discussion of the spine vs shaft results. 

      We added a more detailed description of spine and sha synapse identification, a new supplemental figure (S5) and we now mention the caveat related to the limited z-resolution of 2-photon microscopy in the revised manuscript.

      • Two additional minor details should be clarified in the text of the manuscript: 

      - Please specify the volume of DNA solution injected into each embryo. 

      The injected volume was 1 µl. We added this information in the Methods section of the revised manuscript.

      - In Fig S1, please specify whether the scale bar applies to all images. 

      The scale bar applies to all images. This information was added to the figure legend.

      References

      Leighton AH, Cheyne JE, Houwen GJ, Maldonado PP, De Winter F, Levelt CN, Lohmann C. 2021. Somatostatin interneurons restrict cell recruitment to renally driven spontaneous activity in the developing cortex. Cell Rep 36:109316. doi:10.1016/j.celrep.2021.109316

      Miller M, Peters A. 1981. Maturation of rat visual cortex. II. A combined Golgi-electron microscope study of pyramidal neurons. JComp Neurol 203:555–573.

      Siegel F, Heimel JA, Peters J, Lohmann C. 2012. Peripheral and central inputs shape network dynamics in the developing visual cortex in vivo. Current Biology 22:253–258.

      Wildenberg G, Li H, Sampathkumar V, Sorokina A, Kasthuri N. 2023. Isochronic development of cortical synapses in primates and mice. Nat Commun 14:8018. doi:10.1038/s41467-02343088-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and well-written paper reporting on a novel approach to studying cerebellar function based on the idea of selective recruitment using fMRI. The study is well-designed and executed. Analyses are sound and results are properly discussed. The paper makes a significant contribution to broadening our understanding of the role of the cerebellum in human behavior.

      We thank the reviewer for the positive assessment of our paper.

      (1) While the authors provide a compelling case for the link between BOLD and the cerebellar cortical input layer, there remains considerable unexplained variance. Perhaps the authors could elaborate a bit more on the assumption that BOLD signals mainly reflect the input side of the cerebellum (see for example King et al., elife. 2023 Apr 21;12:e81511).

      Our paper is based on the assumption that the cerebellar BOLD signal reflects solely the input to the cerebellum and does not reflect the changes in firing rates of Purkinje cells. This assumption relies on two lines of arguments: Studies that have directly looked at the mechanism of vasodilation in the cerebellum, and studies that try to infer the contributions of different neurophysiological mechanisms to overall cerebellar metabolism (Attwell and Iadecola, 2002).

      Vasodilatory considerations: The mechanisms that causes vasodilation in the cerebellum, and hence BOLD signal increases, has been extensively studied: Electrical stimulation of mossy fibers (Gagliano et al., 2022; Mapelli et al., 2017), as well as parallel fibers (Akgören et al., 1994; Iadecola et al., 1996; Mathiesen et al., 1998; Yang and Iadecola, 1997) lead to robust increases in cerebellar blood flow. In contrast to the neocortex, the regulation of blood flow in the cerebellum depends nearly purely on the vasodilator Nitric Oxide (NO) (Akgören et al., 1994; Yang and Iadecola, 1997) with stellate cells playing a key role in the signaling cascade (Yang et al., 2000).

      Electrical (Mathiesen et al., 2000) and pharmacological (Yang and Iadecola, 1998) stimulation of climbing fibers also leads to robust increases in blood flow. Simultaneous parallel and climbing fiber stimulation seems to combine sub-additively to determine the blood flow changes (K. Caesar et al., 2003).

      Importantly, even dramatic changes in spiking rate of Purkinje cells do not lead to changes in vasodilation. For starters, parallel fiber stimulation leads to blood flow increases, even though the net effect on Purkinje cell firing is inhibitory (Mathiesen et al., 1998). More importantly, complete inhibition of the Purkinje cell using a GABA agonist does not change baseline cerebellar blood flow (Kirsten Caesar et al., 2003). Conversely, even a 200-300% increase in simple (and complex) spike firing rate through application of a GABA antagonist does not show any measurable consequences for blood flow, even though it clearly increases the metabolic rate of oxygen consumption in the tissue (Thomsen et al., 2009, 2004).

      In sum, this extensive set of studies clearly argues that the cerebellar blood flow response is mostly dictated by synaptic input, and that the firing rate of Purkinje cells does not influence vasodilation. Because the BOLD signal is caused by an supply of oxygen over and above the level of oxygen consumption, this would argue that increases in Purkinje cell firing would not lead to BOLD increases. What is less clear is the degree to which changes in BOLD signal during normal activity are determined by changes in mossy fiber or climbing fiber input. Disruption of either pathway leads to 60-70% reductions in the evoked blood flow response during whisker stimulation (Yang et al., 2000; Zhang et al., 2003) – but it remains unclear to what degree this reflects the distribution of contributions in the healthy animal, as these powerful disruptions may have a number of side-effects.

      Metabolic considerations: To estimate the relative contributions climbing fiber / mossy fiber input to the variations in BOLD signal under natural conditions, it is useful to consider the contributions of different cerebellar processes to the overall metabolism of the cerebellum. Assuming an average firing rate of 40Hz for mossy fibers, ~3Hz for Granule cells, and 1Hz for climbing fibers, Howarth et al. (Howarth et al., 2012, 2010) estimated that the transmission from mossy fibers to granular cells, dominates the energy budget with 53%. The subsequent stage, encompassing the transfer of information from Granular cells to Purkinje cells, accounts for 32% of energy expenditure. In contrast, integration within Purkinje cells and the spiking (simple and complex) of these cells represents only 15% of the total energy consumption.

      More important for the BOLD signal, however, are the activity-induced variations in metabolic consumption: Purkinje cells fire relatively constantly at a very high frequency (~50Hz) both during awake periods and during sleep (Shin et al., 2007). When providing a signal to the neocortex, firing rate decreases, actually lowering the metabolic demand. Climbing fibers normally fire at ~0.5 Hz and even during activity rarely fire much above 2Hz (Streng et al., 2017). In contrast, granule cells show a low firing rates during rest (typically <1hz) and can spike during activity well above 100Hz. Combined with the sheer number of granule cells, these considerations would suggest that the vast majority of the variation in metabolic demand are due to mossy fiber input and granule cell activity.

      Overall, we therefore think it is likely that the main determinant of the cerebellar cortical BOLD signal is mossy fiber input and the transmission of information from mossy fibers to granule cells to Purkinje cells. We admit that the degree to which climbing fiber input contribute to BOLD signal changes is much less clear. We can be quite certain, however, that the firing rate of Purkinje cells does not contribute to the cerebellar BOLD signal, as even dramatic changes in the firing rate do not cause any changes in vasodilation.  We have clarified our line of reasoning in the paper, and hope this more extensive response here will give the reader a better overview over the pertaining literature.

      (2) The current approach does not appear to take the non-linear relationships between BOLD and neural activity into account.

      Thank you for raising this concern. We did not stress this point in the paper, but one big advantage of our selective recruitment approach is that it is – to some degree- robust against non-linearities in the relationship between neural activity and BOLD signal. This is the case, as long as the shape of the non-linearity is similar in the cerebellum and the neocortex. The results of our motor task (Figure 3) provide a clear example of this: The BOLD signal both in the neocortex and cerebellum incases non-linearly as a function of force – the increase from 2.5N to 6N (a 3.5N increase) is larger than the increase from 6N to 10N (a 4N increase). A similar non-linearity can be observed for tapping speed (6, 10 to 18 taps / s). However, within each condition, the relationship between cortical and cerebellar activity is nearly perfectly linear, reflecting the fact that the shape of the non-linearity for the cerebellum and cortex is very similar.

      Most importantly, even if the non-linearity across the two structures is different, any non-linear relationship between neural activity and BOLD signal (of vasodilatory nature) should apply to different conditions (here force and speed increases) similarly. Therefore, if two conditions show overlapping activity levels (as observed for force and speed across medium and high levels, Figure 3), a offset between conditions cannot be caused by a non-linearity in the relationship of cortical and cerebellar activity. Because all conditions are subject to the same non-linearity, all points should lie on a single (likely monotonically increasing) non-linear function. Both for the motor and working memory task, the pattern of results clearly violates this assumption.

      (3) The authors may want to address a bit more the issue of closed loops as well as the underlying neuroanatomy including the deep cerebellar nuclei and pontine nuclei in the context of their current cerebello-cortical correlational approach. But also the contribution of other brain areas such as the basal ganglia and hippocampus. 

      Cortical-cerebellar communication is of course bi-directional. As discussed in King at al., (2023), however, we are restricting our model to the connections from the neocortex to the cerebellum for the following reasons: First, cerebellar BOLD activity likely reflects mostly neocortical input (see our answer to pt. 1), whereas neocortical activity is determined by a much wider array of projections, including striato-thalamo-cortical and cortico-cortical connections. Secondly, the output of the cerebellum cannot be predicted from the BOLD signal of the cerebellar cortex, as it is unlikely that the firing rate of Purkinje cells contribute to cerebellar BOLD signal (see pt. 1). For these reasons we believe that the relationship between neocortical and cerebellar activity patterns is mostly dictated by the connectivity from cortex to cerebellum, and is therefore best modelled as thus. This is now more clearly discussed in a new paragraph (line 318-323) of the revised manuscript.

      We are also ignoring other inputs to the cerebellum, including the spinal chord, the basal ganglia (Bhuvanasundaram et al., 2022; Bostan and Strick, 2018) hippocampus (Froula et al., 2023; Watson et al., 2019), and amygdala (Farley et al., 2016; Jung et al., 2022; Terburg et al., 2024). In humans, however, the neocortex remains the primary source of input to pontine nuclei. Consequently, it stands as the main structure shaping activity within the cerebellar cortex. While it is an interesting question to what degree the consideration of subcortical structures can improve the prediction of cerebellar activity patterns, we believe that considering the neocortex provides a good first approximation.

      Reviewer #1 (Recommendations):

      (4)  A few sentences to clarify the used models as was done in the King et al. (2024) paper may improve readability.

      We have now added the sentences in the introduction (line 25ff):

      To approach this problem, we have recently developed and tested a range of cortical-cerebellar connectivity models (King et al., 2023), designed to capture fixed, or task-invariant, transmission between neocortex and cerebellum. For each cerebellar voxel, we estimated a regularized multiple regression model to predict its activity level across a range of task conditions (King et al., 2019) from the activity pattern observed in the neocortex for the same conditions. The models were then evaluated in their ability to predict cerebellar activity in novel tasks, again based only on the corresponding neocortical activity pattern. Two key results emerged from this work. First, while rs-FC studies (Buckner et al., 2011; Ji et al., 2019; Marek et al., 2018) have assumed a 1:1 mapping between neocortical and cerebellar networks, models which allowed for convergent input from multiple neocortical regions to a single cerebellar region performed better in predicting cerebellar activity patterns for novel tasks. Second, when given a cortical activation pattern, the best performing model could predict about 50% of the reliable variance in the cerebellar cortex across tasks (King et al., 2023).

      (5) To what extent does this paper demonstrate the limitations of BOLD in neuroscientific research? 

      The primary objective of this study was to shed light on the problems of interpreting BOLD activation within the cerebellum. The problem that the BOLD signal mostly reflect input to a region is not unique to the cerebellum, but also applies (albeit likely to a lesser degree) to other brain structures. However, the solution we propose here critically hinges on three features of the cerebellar circuitry: a) the mossy fiber input for the cerebellar hemispheres mostly arise from the neocortex, b) the BOLD signal is likely dominated by this mossy fiber input (see pt. 1), and c) there is very little excitatory recurrent activity in the cerebellum, so output activity in the cerebellum does not cause direct activity in other parts of the cerebellum.

      These features motivate us to use a directed cortex->cerebellum connectivity model, which does not allow for any direct connectivity within the cerebellum. While the same approach can also be applied to other brain structures, it is less clear that the approach would yield valid results here. For example, due the local excitatory recurrent connectivity within neocortical columns, the activity here will also relate to local processing.

      (6) What if the authors reversed their line of reasoning as in that cerebellum activity is matched to map changes in cerebral cortical activity? Perhaps this could provide further evidence for the assumed directional specificity of the task-dependent gating of neocortical inputs. 

      Given (a) that the cerebellar BOLD signal tells us very little about cerebellar output signals (b) that there are many other input signals to the neocortex that are more powerful than cerebellar inputs, and c) that there strong cortical-cortical connections, we believe that this model would be hard to interpret (see also our answer to pt. 3).

      Therefore, while the inversion of the linear task-invariant mapping between cortical and cerebellar activity is a potentially interesting exercise, it is unclear to us at this point what strong predictions we would be able to test with this approach.

      (7) The statement that cerebellar fMRI activity may simply reflect the transmission of neocortical activity through fixed connections can be better explained. Also in the context of using the epiphenomenon (on page 11) in the paper. To what extent is the issue of epiphenomenon not a general problem of fMRI research?

      We have rephrased the introduction of this idea (line 17):

      This means that increases in the cerebellar BOLD signal could simply reflect the automatic transmission of neocortical activity through fixed anatomical connections. As such, whenever a task activates a neocortical region, the corresponding cerebellar region would also be activated, regardless of whether the cerebellum is directly involved in the task or not.

      Epiphemonal activity: This is indeed a general problem in fMRI research (and indeed research that uses neurophysiological recordings, rather than manipulations of activity). Indeed, we have discussed similar issues in the context of motor activity in ipsilateral motor cortex (Diedrichsen et al., 2009). However, given that we only offer a possible approach to address this issue for the cerebellum (see pt. 5), we thought it best to keep the scope of the discussion focused on this structure.

      Reviewer #2 (Public Review):

      Summary:

      Shahshahani and colleagues used a combination of statistical modelling and whole-brain fMRI data in an attempt to separate the contributions of cortical and cerebellar regions in different cognitive contexts.

      Strengths:

      The manuscript uses a sophisticated integration of statistical methods, cognitive neuroscience, and systems neurobiology.

      The authors use multiple statistical approaches to ensure robustness in their conclusions.

      The consideration of the cerebellum as not a purely 'motor' structure is excellent and important. <br />

      We thank the reviewer for their positive evaluation.

      Weaknesses:

      (1) Two of the foundation assumptions of the model - that cerebellar BOLD signals reflect granule cells > purkinje neurons and that corticocerebellar connections are relatively invariant - are still open topics of investigation. It might be helpful for the reader if these ideas could be presented in a more nuanced light.

      Please see response to the comment 1 of Reviewer 1 for a more extensive and detailed justification of this assumption. We have now also clarified our rationale for this assumption better in the paper on line 10-14. Finally, we now also raise explicitly the possibility that some of the violations of the task-invariant model could be caused by selectively increase of climbing fiber activity in some tasks (line 340).

      (2) The assumption that cortical BOLD responses in cognitive tasks should be matched irrespective of cerebellar involvement does not cohere with the idea of 'forcing functions' introduced by Houk and Wise. 

      We are assuming that you refer to the idea that cerebellar output is an important determinant of the dynamics (and likely also of the magnitude) of neocortical activity. We agree most certainly here. However, we also believe that in the context of our paper, it is justified to restrict the model to the connectivity between the neocortex and the cerebellum only (see reviewer 1, comment 3).

      Furthermore, if increased cerebellar output indeed occurs during the conditions for which we identified unusually high cerebellar activity, it should increase neocortical activity, and bring the relationship of the cerebellar and cortical activity again closer to the predictions of the linear model. Therefore, the identification of functions for which cerebellar regions show selective recruitment is rather conservative.

      Reviewer #2 (Recommendations):

      (3) One of the assumptions stated in the abstract -- that the inputs to the cerebellum may simply be a somewhat passive relay of the outputs of the cerebral cortex -- has been challenged recently by work from Litwin-Kumar (Muscinelli et al., 2023 Nature Neuroscience), which argues for complex computational relationships between cortical pyramidal neurons, pontine nuclei and granule cells, which in turn would have a non-linear impact on the relationship between cortical and cerebellar BOLD. The modelling is based on empirical recordings from Wagner (2019, Cell) which show that the synaptic connections between the cortex and granule cells change as a function of learning, further raising concerns about the assumption that the signals inherent within these two systems should be identical. Whether these micro-scale features are indicative of the macroscopic patterns observed in BOLD is an interesting question for future research, but I worry that the assumption of direct similarity is perhaps not reflective of the current literature. The authors do speak to these cells in their discussion, but I believe that they could also help to refine the authors' hypotheses in the manuscript writ large.

      We absolutely agree with your point. However, we want to make extremely clear here that our hypothesis (that the inputs to the cerebellum are a linear task-invariant function of the outputs of the cerebral cortex) is the Null-hypothesis that we are testing in our paper. In fact, our results show the first empirical evidence that task-dependent gating may indeed occur. In this sense, our paper is consistent with the theoretical suggestion of (Muscinelli et al., 2023).

      You may ask whether a linear task-invariant model of cortical-cerebellar connectivity is not a strawman, given that is most likely incorrect. However, as we stress in the discussion (line 298-), a good Null-model is a useful model, even if it is (as all models) ultimately incorrect. Without it, we would not be able to determine which cerebellar activity outstrips the linear prediction. The fact that this Null-model itself can predict nearly 50% of the variance in cerebellar activity patterns across tasks at a group level, means that it is actually a very powerful model, and hence is a much more stringent criterion for evidence for functional involvement than just the presence of activity.

      (4) Further to this point, I didn't follow the authors' logic that the majority of the BOLD response in the cerebellum is reflective of granule cells rather than Purkinje cells. I read through each of the papers that were cited in defense of the comment: "The cerebellar BOLD signal is dominated by mossy fiber input with very little contribution from the output of the cerebellar cortex, the activity of Purkinje cells" and found that none of these studies made this same direct conclusion. As such, I suggest that the authors soften this statement, or provide a different set of references that directly confirm this hypothesis. 

      Please see response to the comment 1, Reviewer 1. We hope the answer provides a more comprehensive overview over the literature, which DOES show that spiking behavior of Purkinje cells does not influence vasodilation (as opposed to mossy fiber input). We have now clarified our rationale and the exact cited literature on line 9-14 of the paper.

      (5) Regarding the statement: "As such, whenever a task activates a neocortical region, we might observe activity in the corresponding cerebellar regions regardless of whether the cerebellum is directly involved in the task or not." -- what if this is a feature, rather than a bug? That is, the organisation of the nervous system has been shaped over phylogeny such that every action, via efference copies of motor outputs, is filtered through the complex architecture of the cerebellum in order to provide a feed-forward signal to the thalamus/cortex (and other connected structures). Houk and Wise made compelling arguments in their 1995 Cerebral Cortex paper arguing that these outputs (among other systems) could act as 'forcing functions' on the kinds of dynamics that arise in the cerebral cortex. I am inclined to agree with their hypothesis, where the implication is that there are no tasks that don't (in some way) depend on cerebellar activity, albeit to a lesser or greater extent, depending on the contexts/requirements of the task. I realise that this is a somewhat philosophical point, but I do think it is important to be clear about the assumptions that form the basis of the reasoning in the paper. 

      This is an interesting point. Our way of thinking about cerebellar function does indeed correspond quite well to the idea of forcing functions- the idea that cerebellar output can “steer” cortical dynamics in a particular way. However, based on patient and lesion data, it is also clear that some cortical functions rely much more critically on cerebellar input than others. We hypothesize here that cerebellar activity is higher (as compared to the neocortical activity) when the functions require cerebellar computation.

      We also agree with the notion that cerebellar contribution is likely not an all-or-none issue, but rather a matter of gradation (line 324ff).

      (6) Regarding the logic of expecting the cortical patterns for speed vs. force to be matched -- surely if the cerebellum was involved more in speed than force production, the feedback from the cerebellum to the cortex (via thalamus) could also contribute to the observed differences? How could the authors control for this possibility? 

      Our model currently indeed does not attempt to quantify the contributions of cerebellar output to cortical activity. However, given that cerebellar output is not visible in the BOLD signal of the cerebellum (see reviewer 1, comment 1), we believe that this is a rational approach. As argued in our response to your comment 2, increased cerebellar output in the speed compared to the force condition should bring the activity relationship closer to the linear model prediction. The fact that we find increased cerebellar (as compared to neocortical) activity in the speed conditions, suggests that there is indeed task-dependent gating of cortical projections to the cerebellum.

      Akgören N, Fabricius M, Lauritzen M. 1994. Importance of nitric oxide for local increases of blood flow in rat cerebellar cortex during electrical stimulation. Proc Natl Acad Sci U S A 91:5903–5907.

      Attwell D, Iadecola C. 2002. The neural basis of functional brain imaging signals. Trends Neurosci 25:621–625.

      Bhuvanasundaram R, Krzyspiak J, Khodakhah K. 2022. Subthalamic Nucleus Modulation of the Pontine Nuclei and Its Targeting of the Cerebellar Cortex. J Neurosci 42:5538–5551.

      Bostan AC, Strick PL. 2018. The basal ganglia and the cerebellum: nodes in an integrated network. Nat Rev Neurosci 19:338–350.

      Buckner RL, Krienen FM, Castellanos A, Diaz JC, Yeo BTT. 2011. The organization of the human cerebellum estimated by intrinsic functional connectivity. J Neurophysiol 106:2322–2345.

      Caesar K., Gold L, Lauritzen M. 2003. Context sensitivity of activity-dependent increases in cerebral blood flow. Proc Natl Acad Sci U S A 100:4239–4244.

      Caesar K., Thomsen K, Lauritzen M. 2003. Dissociation of spikes, synaptic activity, and activity-dependent increments in rat cerebellar blood flow by tonic synaptic inhibition. Proc Natl Acad Sci U S A 100:16000–16005.

      Farley SJ, Radley JJ, Freeman JH. 2016. Amygdala Modulation of Cerebellar Learning. J Neurosci 36:2190–2201.

      Froula JM, Hastings SD, Krook-Magnuson E. 2023. The little brain and the seahorse: Cerebellar-hippocampal interactions. Front Syst Neurosci 17:1158492.

      Gagliano G, Monteverdi A, Casali S, Laforenza U, Gandini Wheeler-Kingshott CAM, D’Angelo E, Mapelli L. 2022. Non-linear frequency dependence of neurovascular coupling in the cerebellar cortex implies vasodilation-vasoconstriction competition. Cells 11:1047.

      Howarth C, Gleeson P, Attwell D. 2012. Updated energy budgets for neural computation in the neocortex and cerebellum. J Cereb Blood Flow Metab 32:1222–1232.

      Howarth C, Peppiatt-Wildman CM, Attwell D. 2010. The energy use associated with neural computation in the cerebellum. J Cereb Blood Flow Metab 30:403–414.

      Iadecola C, Li J, Xu S, Yang G. 1996. Neural mechanisms of blood flow regulation during synaptic activity in cerebellar cortex. J Neurophysiol 75:940–950.

      Ji JL, Spronk M, Kulkarni K, Repovš G, Anticevic A, Cole MW. 2019. Mapping the human brain’s cortical-subcortical functional network organization. Neuroimage 185:35–57.

      Jung SJ, Vlasov K, D’Ambra AF, Parigi A, Baya M, Frez EP, Villalobos J, Fernandez-Frentzel M, Anguiano M, Ideguchi Y, Antzoulatos EG, Fioravante D. 2022. Novel Cerebello-Amygdala Connections Provide Missing Link Between Cerebellum and Limbic System. Front Syst Neurosci 16:879634.

      King M, Hernandez-Castillo CR, Poldrack RA, Ivry RB, Diedrichsen J. 2019. Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nat Neurosci 22:1371–1378.

      King M, Shahshahani L, Ivry RB, Diedrichsen J. 2023. A task-general connectivity model reveals variation in convergence of cortical inputs to functional regions of the cerebellum. Elife 12:e81511.

      Mapelli L, Gagliano G, Soda T, Laforenza U, Moccia F, D’Angelo EU. 2017. Granular layer neurons control cerebellar neurovascular coupling through an NMDA receptor/NO-dependent system. J Neurosci 37:1340–1351.

      Marek S, Siegel JS, Gordon EM, Raut RV, Gratton C, Newbold DJ, Ortega M, Laumann TO, Adeyemo B, Miller DB, Zheng A, Lopez KC, Berg JJ, Coalson RS, Nguyen AL, Dierker D, Van AN, Hoyt CR, McDermott KB, Norris SA, Shimony JS, Snyder AZ, Nelson SM, Barch DM, Schlaggar BL, Raichle ME, Petersen SE, Greene DJ, Dosenbach NUF. 2018. Spatial and Temporal Organization of the Individual Human Cerebellum. Neuron 100:977-993.e7.

      Mathiesen C, Caesar K, Akgören N, Lauritzen M. 1998. Modification of activity-dependent increases of cerebral blood flow by excitatory synaptic activity and spikes in rat cerebellar cortex. J Physiol 512 ( Pt 2):555–566.

      Mathiesen C, Caesar K, Lauritzen M. 2000. Temporal coupling between neuronal activity and blood flow in rat cerebellar cortex as indicated by field potential analysis. J Physiol 523:235–246.

      Muscinelli SP, Wagner MJ, Litwin-Kumar A. 2023. Optimal routing to cerebellum-like structures. Nat Neurosci 26:1630–1641.

      Shin S-L, Hoebeek FE, Schonewille M, De Zeeuw CI, Aertsen A, De Schutter E. 2007. Regular patterns in cerebellar Purkinje cell simple spike trains. PLoS One 2:e485.

      Streng ML, Popa LS, Ebner TJ. 2017. Climbing Fibers Control Purkinje Cell Representations of Behavior. J Neurosci 37:1997.

      Terburg D, van Honk J, Schutter DJLG. 2024. Doubling down on dual systems: A cerebellum–amygdala route towards action- and outcome-based social and affective behavior. Cortex 173:175–186.

      Thomsen K, Offenhauser N, Lauritzen M. 2004. Principal neuron spiking: neither necessary nor sufficient for cerebral blood flow in rat cerebellum. J Physiol 560:181–189.

      Thomsen K, Piilgaard H, Gjedde A, Bonvento G, Lauritzen M. 2009. Principal cell spiking, postsynaptic excitation, and oxygen consumption in the rat cerebellar cortex. J Neurophysiol 102:1503–1512.

      Watson TC, Obiang P, Torres-Herraez A, Watilliaux A, Coulon P, Rochefort C, Rondi-Reig L. 2019. Anatomical and physiological foundations of cerebello-hippocampal interaction. Elife 8:e41896.

      Yang G, Huard JM, Beitz AJ, Ross ME, Iadecola C. 2000. Stellate neurons mediate functional hyperemia in the cerebellar molecular layer. J Neurosci 20:6968–6973.

      Yang G, Iadecola C. 1998. Activation of cerebellar climbing fibers increases cerebellar blood flow: role of glutamate receptors, nitric oxide, and cGMP. Stroke 29:499–507; discussion 507-8.

      Yang G, Iadecola C. 1997. Obligatory role of NO in glutamate-dependent hyperemia evoked from cerebellar parallel fibers. Am J Physiol 272:R1155-61.

      Zhang Y, Forster C, Milner TA, Iadecola C. 2003. Attenuation of activity-induced increases in cerebellar blood flow by lesion of the inferior olive. Am J Physiol Heart Circ Physiol 285:H1177-82.

    1. Author response:

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public Reviews:

      Reviewer #1:

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article, which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      Reviewer #2:

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

    1. Author response:

      We deeply appreciate the editors’ and reviewers’ invaluable time and effort. We would also like to extend our gratitude to eLife for its unwavering commitment to a transparent review and publication model. Below, we present our point-by-point responses to the comments.  

      Besides the WT allele, equivalent to the mouse TMEM173 gene, the human TMEM173 gene has two common alleles: the HAQ and AQ alleles carried by billions of people. The main conclusions and interpretation, summarized in the Title and Abstract, are (i) Different from the WT TMEM173 allele, the HAQ or AQ alleles are resistant to STING activation-induced cell death; (ii) STING residue 293 is critical for cell death; (iii) HAQ, AQ alleles are dominant to the SAVI allele; iv) One copy of the AQ allele rescues the SAVI disease in mice. We propose that STING research and STING-targeting immunotherapy should consider human TMEM173 heterogeneity. These interpretations and conclusions were based on Data and Logic. We welcome alternative, logical interpretations from our peers and potential collaborations to advance the human TMEM173 research.  

      Reviewer #1 (Public Review):

      Responses to Comment 1: We greatly appreciate Reviewer 1's insights. We will change the “lymphocytes” to “splenocytes” (line 134) as suggested. We respectfully disagree with Reviewer 1’s comments on TBK1 (lines 129 – 134). First, we used two different TBK1 inhibitors: BX795 and GSK8612. Second, because BX795 also inhibits PDK1, we used a PDK1 inhibitor GSK2334470; Third, both BX795 and GSK8612 completely inhibited diABZI-induced splenocyte cell death (Figure 1B). The logical conclusion is “TBK1 activation is required for STING-mediated mouse spleen cell death ex vivo”. (line 118). 

      This manuscript uncovers a significant aspect of the interplay between the common human TMEM173 alleles and the rare SAVI mutation (lines 23-26). Our discovery that the common human TMEM173 alleles are resistant to STING activation-induced cell death is a substantial finding. It further strengthens the argument that the HAQ and AQ alleles are functionally distinct from the WT allele 1-3. We wish to underscore the crucial message of this study-that 'STING research and STING-targeting immunotherapy should consider TMEM173 heterogeneity in humans' (line 37), which has been largely overlooked in current STING clinical trials 4.  

      Regarding STING-Cell death, as we stated in the Introduction (lines 62-79). (i) STING-mediated cell death is cell type-dependent 5-7 and type I IFNs-independent 5,7,8. (ii) The in vivo biological significance of STING-mediated cell death is not clear 7,8. (iii) The mechanisms of STING-Cell death remain controversial. Multiple cell death pathways, i.e., apoptosis, necroptosis, pyroptosis, ferroptosis, and PANoptosis, are proposed 7,9,10. SAVI patients (WT/SAVI) and mouse models had CD4 T cellpenia 8,11. SAVI/HAQ, SAVI/AQ restored T cells in mice. Thus, the manuscript provides some answers to the biological significance of STING-cell death. Next, splenocytes from Q293/Q293 mice are resistant to STING cell death. The logical conclusion is that the amino acid 293 is critical for STING cell death. How aa293 mediates this function needs future investigation. Similarly, how TBK1 mediates STING cell death, independent of type I IFNs and NFκB induction, needs future investigation.

      Responses to Comment 2: These are all very interesting questions that we will address in future studies. This manuscript, titled “The common TMEM173 HAQ, AQ alleles rescue CD4 T cellpenia, restore T-regs, and prevent SAVI (N153S) inflammatory disease in mice” does not focus on Q293 mice. We have been researching the common human TMEM173 alleles since 2011 from the discovery12 , mouse model1,3, human clinical trial2, and human genetics studies 3. This manuscript is another step towards understanding these common human TMEM173 alleles with the new discovery that HAQ, AQ are resistant to STING cell death. 

      Responses to Comment 3: We aim to address these worthy questions in future studies. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+

      Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to

      HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele. We will explore these possibilities in future research.   

      Responses to Comment 4: Figure 2 is necessary because it reveals the difference between mouse and human STING cell death. Figure 2A-2B showed that STING activation killed human CD4 T cells, but not human CD8 T cells or B cells. This observation is different from Figure 1A, where STING activation killed mouse CD4, CD8 T cells, and CD19 B cells, revealing the species-specific STING cell death responses. Regarding human CD8 T cells, as we stated in the Discussion (lines 318-320), human CD8 T cells (PBMC) are not as susceptible as the CD4 T cells to STING-induced cell death 8. We used lung lymphocytes that showed similar observations (Figure 2A). For Figure 2C, we used 2 WT/HAQ and 3 WT/WT individuals (lines 738-739). We generate HAQ, AQ THP-1 cells in STING-KO THP-1 cells (Invivogen,, cat no. thpd-kostg) (lines 740-741). 

      A recent study found that STING agonist SHR1032 induces cell death in STING-KO THP-1 cells expressing WT(R232) human STING 10 (line 182) independent of type I IFNs. SHR1032 suppressed THP1-STING-WT(R232) cell growth at GI50: 23 nM while in the parental THP1STING-HAQ cells, the GI50 of SHR1032 was >103 nM 10. Cytarabine was used as an internal control where SHR1032 killed more robustly than cytarabine in the THP1-STING-WT(R232) cells but much less efficiently than cytarabine in the THP-1-STING-HAQ cells 10.   

      This manuscript rigorously uses mouse splenocytes, human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo.

      We agree with reviewer 1 that STING-mediated cell death mechanisms in myeloid and lymphoid cells may be different and likely contribute to the different mechanisms proposed in STING cell death research 7,9,10. Our study focuses on the in vivo mechanism of T cellpenia.  

      Responses to Comment 5: We stated in the Introduction that “AQ responds to CDNs and produce type I IFNs in vivo and in vitro 3,13,14 ”(line 94, 95). We reported that the AQ knock in mice responded to STING activation 3. We previously showed that there was a negative natural selection on the AQ allele in individuals outside of Africa 3. 28% of Africans are WT/AQ but only 0.6% East Asians are WT/AQ 3. Future research on the AQ allele will address this interesting question that may shed new mechanistic light on STING action.

      Responses to Comment 6: The comment here is similar to comment 3. In this manuscript, Figure 6 shows AQ/SAVI had more T-regs than HAQ/SAVI (lines 246 – 256). In our previous publication on HAQ, AQ knockin mice, we showed that AQ T-regs have more IL-10 and mitochondria activity than HAQ T-regs 3. We propose that increased IL-10+ Tregs in AQ mice may contribute to an improved phenotype in AQ/SAVI compared to HAQ/SAVI. However, we are not excluding other contributions (e.g. metabolic difference) by the AQ allele.

      Responses to Comment 7: Both radioresistant parenchymal and/or stromal cells and hematopoietic cells influence SAVI pathology in mice 15,16. Nevertheless, the lack of CD 4 T cells, including the anti-inflammatory T-regs, likely contributes to the inflammation in SAVI mice and patients. We characterized lung function, lung inflammation (Figure 4), lung neutrophils, and inflammatory monocyte infiltration (Figure S4). 

      Responses to Comment 8: Several publications have linked STING to HIV pathogenesis 17-22  (line 271). The manuscript studies STING activation-induced cell death. It is not stretching to ask, for example, does preventing STING cell death, without affecting type I IFNs production, restore CD4 T cell counts and improve care for AIDS patients?

      Reviewer #2 (Public Review):

      Response to Comment 1: Please see the Figure below for cell death by diABZI, DMXAA in Splenocytes from WT/WT, WT/HAQ, HAQ/SAVI, AQ/SAVI mice. The HAQ/SAVI and AQ/SAVI splenocytes showed similar partial resistance to STING activationinduced cell death. 

      Responses to Comment 2: We examined HAQ, AQ mouse splenocytes, HAQ human lung lymphocytes, THP-1 reconstituted with HAQ, AQ, and HAQ/SAVI, AQ/SAVI mice, to demonstrate that the common human HAQ, AQ alleles are resistant to STING cell death in vitro and in vivo. Additional human T cell line work does not add too much. 

      Responses to Comment 3: This is possibly a misunderstanding. We use BMDM for the purpose of comparing STING signaling (TBK1, IRF3, NFκB, STING activation) by WT/SAVI, HAQ/SAVI, AQ/SAVI. Ideally, we would like to compare STING signaling in CD4 T cells from WT/SAVI to HAQ/SAVI, AQ/SAVI mice. However, WT/SAVI has no CD4 T cells. Here, we are making the assumption that the basic STING signaling (TBK1, IRF3, NFκB, STING activation) is conserved between T cells and macrophages. 

      Responses to Comment 4: Reviewer 2 suggests looking for evidence of inflammation and STING activation in the lungs of HAQ/SAVI, AQ/SAVI. We would like to elaborate further. First, anti-inflammatory treatments, e.g. steroids, DMARDs, IVIG, Etanercept, rituximab, Nifedipine, amlodipine, et al., all failed in SAVI patients 11. Second, Figure S4 examined lung neutrophils and inflammatory monocyte infiltration. Interestingly, while AQ/SAVI mice had a better lung function than HAQ/SAVI mice (Figure 4D, 4E vs 4H, 4I), HAQ/SAVI and AQ/SAVI lungs had comparable neutrophils and inflammatory monocyte infiltration. Last, SAVI is classified as type I interferonopathy 11, but the lung diseases of SAVI are mainly independent of type I IFNs 23-26. The AQ allele suppresses SAVI in vivo.  Understanding the mechanisms by which AQ rescues SAVI can generate curative care for SAVI patients.  

      Author response image 1.

      (A-B). Flow cytometry of HAQ/SAVI, AQ/SAVI, WT/WT or WT/HAQ splenocytes treated with diABZI (100ng/ml) or DMXAA (20µg/ml) for 24hrs. Cell death was determined by PI staining. Data are representative of three independent experiments. Graphs represent the mean with error bars indication s.e.m. p values are determined by one-way ANOVA Tukey’s multiple comparison test. * p<0.05. n.s: not significant.

      References.

      (1)             Patel, S. et al. The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele. J Immunol 198, 776-787 (2017). 

      (2)             Sebastian, M. et al. Obesity and STING1 genotype associate with 23-valent pneumococcal vaccination efficacy. JCI Insight 5 (2020). 

      (3)             Mansouri, S. et al. MPYS Modulates Fatty Acid Metabolism and Immune Tolerance at Homeostasis Independent of Type I IFNs. J Immunol 209, 2114-2132 (2022). 

      (4)             Sivick, K. E. et al. Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4183-4185 (2017). 

      (5)             Gulen, M. F. et al. Signalling strength determines proapoptotic functions of STING. Nat Commun 8, 427 (2017). 

      (6)             Kabelitz, D. et al. Signal strength of STING activation determines cytokine plasticity and cell death in human monocytes. Sci Rep 12, 17827 (2022). 

      (7)             Murthy, A. M. V., Robinson, N. & Kumar, S. Crosstalk between cGAS-STING signaling and cell death. Cell Death Differ 27, 2989-3003 (2020). 

      (8)             Kuhl, N. et al. STING agonism turns human T cells into interferon-producing cells but impedes their functionality. EMBO Rep 24, e55536 (2023). 

      (9)             Li, C., Liu, J., Hou, W., Kang, R. & Tang, D. STING1 Promotes Ferroptosis Through MFN1/2-Dependent Mitochondrial Fusion. Front Cell Dev Biol 9, 698679 (2021). 

      (10)         Song, C. et al. SHR1032, a novel STING agonist, stimulates anti-tumor immunity and directly induces AML apoptosis. Sci Rep 12, 8579 (2022). 

      (11)         Liu, Y. et al. Activated STING in a vascular and pulmonary syndrome. N Engl J Med 371, 507-518 (2014). 

      (12)         Jin, L. et al. Identification and characterization of a loss-of-function human MPYS variant. Genes Immun 12, 263-269 (2011). 

      (13)         Yi, G. et al. Single nucleotide polymorphisms of human STING can affect innate immune response to cyclic dinucleotides. PLoS One 8, e77846 (2013). 

      (14)         Patel, S. et al. Response to Comment on "The Common R71H-G230A-R293Q Human TMEM173 Is a Null Allele". J Immunol 198, 4185-4188 (2017). 

      (15)         Gao, K. M. et al. Endothelial cell expression of a STING gain-of-function mutation initiates pulmonary lymphocytic infiltration. Cell Rep 43, 114114 (2024). 

      (16)         Gao, K. M., Motwani, M., Tedder, T., Marshak-Rothstein, A. & Fitzgerald, K. A. Radioresistant cells initiate lymphocyte-dependent lung inflammation and IFNgammadependent mortality in STING gain-of-function mice. Proc Natl Acad Sci U S A 119, e2202327119 (2022). 

      (17)         Monroe, K. M. et al. IFI16 DNA sensor is required for death of lymphoid CD4 T cells abortively infected with HIV. Science 343, 428-432 (2014). 

      (18)         Doitsh, G. et al. Cell death by pyroptosis drives CD4 T-cell depletion in HIV-1 infection. Nature 505, 509-514 (2014). 

      (19)         Jakobsen, M. R., Olagnier, D. & Hiscott, J. Innate immune sensing of HIV-1 infection. Curr Opin HIV AIDS 10, 96-102 (2015). 

      (20)         Silvin, A. & Manel, N. Innate immune sensing of HIV infection. Curr Opin Immunol 32, 54-60 (2015). 

      (21)         Altfeld, M. & Gale, M., Jr. Innate immunity against HIV-1 infection. Nat Immunol 16, 554-562 (2015). 

      (22)         Krapp, C., Jonsson, K. & Jakobsen, M. R. STING dependent sensing - Does HIV actually care? Cytokine Growth Factor Rev 40, 68-76 (2018). 

      (23)         Luksch, H. et al. STING-associated lung disease in mice relies on T cells but not type I interferon. J Allergy Clin Immunol 144, 254-266 e258 (2019). 

      (24)         Stinson, W. A. et al. The IFN-gamma receptor promotes immune dysregulation and disease in STING gain-of-function mice. JCI Insight 7 (2022). 

      (25)         Warner, J. D. et al. STING-associated vasculopathy develops independently of IRF3 in mice. J Exp Med 214, 3279-3292 (2017). 

      (26)         Fremond, M. L. et al. Overview of STING-Associated Vasculopathy with Onset in Infancy (SAVI) Among 21 Patients. J Allergy Clin Immunol Pract 9, 803-818 e811 (2021).

    1. Author response:

      eLife assessment

      This valuable study reveals how a rhizobial effector protein cleaves and inhibits a key plant receptor for symbiosis signaling, while the host plant counters by phosphorylating the effector. The molecular evidence for the protein-protein interaction and modification is solid, though biological evidence directly linking effector cleavage to rhizobial infection is incomplete. With additional functional data, this work could have implications for understanding intricate plant-microbe dynamics during mutualistic interactions.

      Thank you for this helpful comment. In the revised manuscript version, we will be more prudent with directly linking cleavage of Nod factor receptors by NopT and rhizobial infection.

      We plan to modify the Title, the One-Sentence Summary, Abstract, and Discussion regarding this point.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bacterial effectors that interfere with the inner molecular workings of eukaryotic host cells are of great biological significance across disciplines. On the one hand they help us to understand the molecular strategies that bacteria use to manipulate host cells. On the other hand they can be used as research tools to reveal molecular details of the intricate workings of the host machinery that is relevant for the interaction/defence/symbiosis with bacteria. The authors investigate the function and biological impact of a rhizobial effector that interacts with and modifies, and curiously is modified by, legume receptors essential for symbiosis. The molecular analysis revealed a bacterial effector that cleaves a plant symbiosis signaling receptor to inhibit signaling and the host counterplay by phosphorylation via a receptor kinase. These findings have potential implications beyond bacterial interactions with plants.

      Thank you for highlighting the broad significance of rhizobial effectors in understanding legume-rhizobium interactions. We fully agree with your assessment and will emphasize these points in the revised Introduction and Discussion sections of our manuscript. Specifically, we will expand our Discussion regarding the potential impact of the NopT interaction with symbiotic receptor kinases on plant immune signaling and regarding the general significance of our work.

      Bao and colleagues investigated how rhizobial effector proteins can regulate the legume root nodule symbiosis. A rhizobial effector is described to directly modify symbiosis-related signaling proteins, altering the outcome of the symbiosis. Overall, the paper presents findings that will have a wide appeal beyond its primary field.

      Out of 15 identified effectors from Sinorhizobium fredii, they focus on the effector NopT, which exhibits proteolytic activity and may therefore cleave specific target proteins of the host plant. They focus on two Nod factor receptors of the legume Lotus japonicus, NFR1 and NFR5, both of which were previously found to be essential for the perception of rhizobial nod factor, and the induction of symbiotic responses such as bacterial infection thread formation in root hairs and root nodule development (Madsen et al., 2003, Nature; Tirichine et al., 2003; Nature). The authors present evidence for an interaction of NopT with NFR1 and NFR5. The paper aims to characterize the biochemical and functional consequences of these interactions and the phenotype that arises when the effector is mutated.

      Thank you for your positive feedback on our manuscript. In the revised Introduction and Discussion sections, we plan to better emphasize the interdisciplinary significance of our work. We will show how the knowledge gained from our study can contribute to a better understanding of microbial interactions with eukaryotic hosts in general, which may have a stimulating effect on future research in various research areas such as pathogenesis and immunity.

      To ensure that the readers can easily follow the rationale behind our experiments, we will improve the Results section and provide more detailed explanations of how NopT among 15 examined effectors was selected. Additionally, we will provide more background information on NopT and the roles of NFR1 and NFR5 in symbiotic signaling in the Introduction section. As suggested, we will include the references Madsen et al. (2003) and Tirichine et al. (2003) as well as additional references on rhizobial NopT proteins into our revised manuscript version.

      Evidence is presented that in vitro NopT can cleave NFR5 at its juxtamembrane region. NFR5 appears also to be cleaved in vivo. and NFR1 appears to inhibit the proteolytic activity of NopT by phosphorylating NopT. When NFR5 and NFR1 are ectopically over-expressed in leaves of the non-legume Nicotiana benthamiana, they induce cell death (Madsen et al., 2011, Plant Journal). Bao et al., found that this cell death response is inhibited by the coexpression of nopT. Mutation of nopT alters the outcome of rhizobial infection in L. japonicus. These conclusions are well supported by the data.

      We appreciate that you recognize the value of our data.

      The authors present evidence supporting the interaction of NopT with NFR1 and NFR5. In particular, there is solid support for cleavage of NFR5 by NopT (Figure 3) and the identification of NopT phosphorylation sites that inhibit its proteolytic activity (Figure 4C). Cleavage of NFR5 upon expression in N. benthamiana (Figure 3A) requires appropriate controls (inactive mutant versions) that have been provided, since Agrobacterium as a closely rhizobia-related bacterium, might increase defense related proteolytic activity in the plant host cells.

      Thank you for recognizing the use of an inactive NopT variant in Figure 3A. In fact, increased activity of plant proteases induced by Agrobacterium is an important point that should not be neglected. We plan to mention this aspect in our revised Discussion.

      In the context of your comments, we are planning to make the following improvements to the manuscript:

      (1) We will add a more detailed description of the experimental conditions under which the cleavage of NFR5 by NopT was observed in vitro and in vivo.

      (2) We plan to provide more comprehensive data on the phosphorylation of NopT by NFR1, including phosphorylation assays and mass spectrometry results. These additional data support the proposed mechanism by which NFR1 inhibits the proteolytic activity of NopT.

      (3) We will expand the Discussion on the cell death response induced by ectopic expression of NFR1 and NFR5 in Nicotiana benthamiana. We will include more details from Madsen et al. (2011) to contextualize our findings with published literature.

      We believe these additions and clarifications will enhance the clarity and impact of our findings.

      Key results from N. benthamiana appear consistent with data from recombinant protein expression in bacteria. For the analysis in the host legume L. japonicus transgenic hairy roots were included. To demonstrate that the cleavage of NFR5 occurs during the interaction in plant cells the authors build largely on western blots. Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with nopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Thank you for your comments regarding the cleavage of NFR5 and its functional implications. In the revised version, we will change our manuscript taking into account the following considerations:

      (1) We acknowledge that the Western blots indicate only a small proportion of NFR5 is cleaved when co-expressed with NopT. It is worth noting in this context that the proteins were expressed at high levels which likely do not reflect the natural situation in L. japonicus. Low production of cleaved NFR5 in our Western blots with transformed N. benthamiana or L. japonicus cells thus may simply reflect an experimental effect due to high NFR5 protein synthesis. We suggest that the presence of high amounts of intact NFR5 does not have a significant functional impact on plant responses (cell death in N. benthamiana, rhizobial infection of L. japonicus) whereas NFR5 cleavage (or formation of NFR5 cleavage products) may be crucial for the observation of the observed phenotypic changes. The fraction of cleaved NFR5, although small, may be sufficient to disrupt crucial signaling pathways, leading to observable phenotypic changes. We will address possible differences between experimental and natural protein levels in our revised Discussion.

      (2) We studied in our work three biochemical aspects of NopT: (i) physical binding of NopT to NFR1 and NFR5 (ii) proteolytical cleavage of NFR5 by NopT and (iii) phosphorylation of NopT by NFR1. These three biochemical properties appear to influence each other. Phosphorylation of NopT by NFR1 appears to reduce its proteolytic activity, thereby counteracting NFR5 degradation by NopT (NFR5 homeostasis). Moreover, as NopT is a phosphorylation substrate for NFR1, NopT probably interferes with kinase mediated downstream responses of NFR1. Thus, NFR5 cleavage activity of NopT appears to be only one feature of NopT. We plan to mention these considerations in our revised Discussion.

      It is also difficult to evaluate how the ratios of cleaved and full-length protein change when different versions of NopT are present without a quantification of band strengths normalized to loading controls (Figure 3C, 3D, 3F). The same is true for the blots supporting NFR1 phosphorylation of NopT (Figure 4A).

      Thank you for pointing out this aspect. Following your recommendation, we will quantify the band intensities for cleaved and full-length NFR5 in the experiments with different versions of NopT. These values will be normalized to loading controls. Similarly, the Western blots supporting NFR1 phosphorylation of NopT will be quantified. The data for normalized band intensities will be included into the revised figures. The quantifications will provide a clearer understanding of how the ratios of cleaved to full-length proteins change with different NopT variants and also will provide information to which extent NopT is phosphorylated by NFR1.

      It is clear that mutation of nopT results in a quantitative infection phenotype. Nodule primordia and infection threads are still formed when L. japonicus plants are inoculated with ∆nopT mutant bacteria, but it is not clear if these primordia are infected or develop into fully functional nodules (Figure 5). A quantification of the ratio of infected and non-infected nodules and primordia would reveal whether NopT is only active at the transition from infection focus to thread or perhaps also later in the bacterial infection process of the developing root nodule.

      Thank you for pointing this out. In the revised version of our manuscript, we will provide data showing that there are no obvious differences in nodule formation in plants inoculated with ∆nopT and wild-type NGR234, respectively. However, quantification of infection threads containing our GFP-labeled rhizobia in primordia and nodules would be difficult to perform due to strong autofluorescence signals in these tissues. The main goal of our study was to identify and characterize the interaction between NopT and Nod factor receptors. We therefore believe that an in-depth analysis of the bacterial infection process at later symbiotic stages is out of the scope of the present work.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents data demonstrating NopT's interaction with Nod Factor Receptors NFR1 and NFR5 and its impact on cell death inhibition and rhizobial infection. The identification of a truncated NopT variant in certain Sinorhizobium species adds an interesting dimension to the study. These data try to bridge the gaps between classical Nod-factor-dependent nodulation and T3SS NopT effector-dependent nodulation in legume-rhizobium symbiosis. Overall, the research provides interesting insights into the molecular mechanisms underlying symbiotic interactions between rhizobia and legumes.

      Strengths:

      The manuscript nicely demonstrates NopT's proteolytic cleavage of NFR5, regulated by NFR1 phosphorylation, promoting rhizobial infection in L. japonicus. Intriguingly, authors also identify a truncated NopT variant in certain Sinorhizobium species, maintaining NFR5 cleavage but lacking NFR1 interaction. These findings bridge the T3SS effector with the classical Nod-factor-dependent nodulation pathway, offering novel insights into symbiotic interactions.

      We appreciate that you recognize the value of our manuscript.

      Weaknesses:

      (1) In the previous study, when transiently expressed NopT alone in Nicotiana tobacco plants, proteolytically active NopT elicited a rapid hypersensitive reaction. However, this phenotype was not observed when expressing the same NopT in Nicotiana benthamiana (Figure 1A). Conversely, cell death and a hypersensitive reaction were observed in Figure S8. This raises questions about the suitability of the exogenous expression system for studying NopT proteolysis specificity.

      We appreciate your attention to these plant-specific differences. In view of your comments, we plan to revise the Discussion and explain the different expression systems used for studying NopT effects in planta. Previous studies showed that NopT expressed in tobacco (N. tabacum) or in specific Arabidopsis thaliana ecotypes (with PBS1/RPS5 genes) causes rapid cell death (Dai et al. 2008; Khan et al. 2022). Our data shown in Fig. S8 confirm these findings. As cell death (effector triggered immunity) is usually associated with induction of protease activities, we considered N. tabacum and A. thaliana plants as not suitable for testing NFR5 cleavage by NopT. In fact, no NopT/NFR5 experiments were performed with these plants in our study. In contrast, the expression of NopT in Nicotiana benthamiana did not lead to cell death in our experiments. Khan et al. 2022 also reported that cell death does not occur in N. benthamiana unless the cells were transformed with PBS1/RPS5 constructs. Thus, N. benthamiana is a suitable expression system to analyze NopT protease activity on co-expressed substrates. Our revision aims to better understand the advantages of the N. benthamiana expression system for studying NopT mediated proteolysis of NFR5.

      (2) NFR5 Loss-of-function mutants do not produce nodules in the presence of rhizobia in lotus roots, and overexpression of NFR1 and NFR5 produces spontaneous nodules. In this regard, if the direct proteolysis target of NopT is NFR5, one could expect the NGR234's infection will not be very successful because of the Native NopT's specific proteolysis function of NFR5 and NFR1. Conversely, in Figure 5, authors observed the different results.

      Our inoculation experiments clearly show that NopT of NGR234 has a negative effect on formation of infection foci (Fig. 5A) and nodule primordia (Fig. 5E). Our biochemical analysis indicates that NopT targets the NFR1/NFR5 complex, which most likely impairs activation of downstream responses such as NIN gene expression. Accordingly, NIN promoter activity was found to be higher in roots inoculated with the Δ_nopT_ mutant as compared to the NGR234 wild-type (Fig. 5B and 5D). It is therefore plausible that NopT impairs rhizobial infection of L. japonicus due to inhibition of NFR1/NFR5 functions. We agree with this Reviewer that it can be expected that “NGR234's infection will not be very successful”. Fig. 5 confirms that Δ_nopT_ mutant is indeed a better symbiont and we do not think that we obtained “unexpectedly different results”. In the revised version, we will try to formulate our discussion text better in order to avoid any misunderstandings. Furthermore, will write as figure title “NopT dampens rhizobial infection…” instead of “NopT regulates rhizobial infection…”. We are also considering changing the title of our manuscript.  

      (3) In Figure 6E, the model illustrates how NopT digests NFR5 to regulate rhizobia infection. However, it raises the question of whether it is reasonable for NGR234 to produce an effector that restricts its own colonization in host plants.

      We acknowledge the potential paradox of NGR234 producing an effector that appears to restrict its own colonization in host plants. In fact, depending on the host plant, most rhizobial effectors are “double-edged swords” that play either a positive or negative role in the symbiosis. In response to your comment, we will discuss the possibility that NopT may confer selective advantages in interactions between NGR234 and host plants where NopT plays a positive symbiotic role (Dai et al. 2008; Kambara et al. 2009). Inhibition of NFR1/NFR5 functions by NopT in these host plants could be a feedback response in cells in which symbiotic signaling has already started. It is tempting speculate that the interaction between NopT and Nod factor receptors reduces Nod factor perception and downstream signaling to avoid a possible overreaction of symbiotic signaling, which may result in hypernodulation or formation of empty nodules without bacteria. Furthermore, it is tempting to speculate that NopT targets not only Nod factor receptors but also other host proteins to promote symbiosis, e.g. by suppressing excessive immune responses triggered by hyperinfection of rhizobia. In our revised manuscript, we will highlight the need for further investigations to elucidate the precise mechanisms underlying the observed infection phenotype and the role of NopT in modulating symbiotic signaling pathways.  

      (4) The failure to generate stable transgenic plants expressing NopT in Lotus japonicus is surprising, considering the manuscript's claim that NopT specifically proteolyzes NFR5, a major player in the response to nodule symbiosis, without being essential for plant development.

      Thank you for your comments. The failure to obtain L. japonicus plants constitutively expressing NopT was indeed surprising and suggests that NopT targets not only NFR5 but also other proteins in L. japonicus. The number of NopT substrates in plants could be greater than assumed. For example, we show in our work that NopT can cleave AtLYK5 and LjLYS11. In our manuscript, we don’t provide protocols and data on our efforts to construct L. japonicus plants stably expressing NopT. Indeed, it cannot be completely ruled out that the observed failure is not due to NopT expression, but rather to other factors that influence the transformation and regeneration of explants into whole plants. Our results should therefore not be over-interpreted. We consider a discussion of our failed transformation experiments to be somewhat preliminary and not central to this manuscript. herefore, we plan to modify our Discussion and delete the sentence reporting that stable transgenic plants expressing NopT have not been successfully generated.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers for their overall careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision built on the strengths identified by the reviewers, and addressed all the concerns they have raised. Both reviewers recognize that our revisions have improved the paper.  Since the first submission we have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model, as recommended by Reviewer 1

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis.

      Below we provide a brief response to the Reviewers’ comments on our manuscript revision.

      Reviewer #1: Public Review:

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work provides now convincing evidence and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      Weaknesses:

      In its current form, the authors have improved the clarity of the results and the model details, and have provided a new set of simulations to complement and reinforce the original ones (including the development of a new spatial working memory model based on silent working memory principles). I do not appreciate any significant weaknesses at this point.

      We thank the reviewer for these positive comments on our revision and for the suggestion of adding the silent memory model, as we feel this has strengthened our findings.

      Reviewer #2: Public Review:

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative and is further weakened by the existence of known other age-related effects in PFC circuits.

      Comments on revised version:

      The paper has improved in the revision, although I still think a reduced model would have been nice.

      As noted above, in addition to our spiking bump attractor model, our revision includes a second network-level model:  an activity-silent working memory model for continuous features.  We found qualitatively similar effects as in our bump attractor network model, showing that our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).  This new model was described in two new supplementary figures and a new paragraph in the Results section.

      We did not add a reduced model in our revision to this paper, since neither reviewer explicitly recommended that we add one.  As we noted in our private response to reviewers that accompanied our revision: we share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision-making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). We disagree with the reviewer on an important point: we feel that the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1).  Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      While we agree that averaging results about different parameter sets provide a limited understanding of the system, we persist in our belief that such analyses provide an important baseline.  We acknowledge that results vary across our model cohort; this is why we included the heatmaps of our single cell model perturbation results (Figure 3 and Supplementary Figure 3), and simulated network models representing a heterogeneity of neuronal axons with healthy and altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figures 7 and 8).  The model framework we present here is well-suited for more targeted analyses and better insights, including those which we are pursuing currently.


      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful evaluation of our work, the constructive criticism, and their many helpful suggestions. We feel that our revision builds on the strengths identified by the reviewers, and addresses all the concerns they have raised. We have:

      • Rewritten large parts of the papers to improve clarity and make it more concise where possible

      • Simulated an alternative working memory model

      • Included 4 new/revised supplementary figures, following the reviewer’s suggestions for additional analysis

      Reviewer #1 (Public Review):

      Summary:

      The authors study the effects of myelin alterations in working memory via the complementary use of two computational approaches: one based on the de- and re-myelination in multicompartmental models of pyramidal neurons, and one based on synaptic changes in a spiking bump attractor model for spatial working memory. The first model provides the most precise angle (biophysically speaking) of the different effects (loss of myelin lamella or segments, remyelination with thinner and shorter nodes, etc), while the second model allows to infer the consequences of myelin alterations in working memory performance, including memory stability, duration, and bump diffusion. The results indicate (i) a slowing down and failure of propagation of spikes with demyelination and partial recovery with remyelination, with detailed predictions on the role of nodes and myelina lamella, and (ii) a decrease in memory duration and an increase in memory drift as a function of the demyelination, in agreement with multiple experimental studies.

      Strengths:

      Overall, the work offers a very interesting approach of a topic which is hard to accomplish experimentally --therefore the computational take is entirely justified and extremely useful. The authors carefully designed the computational experiments to shed light into the demyelination effects on working memory from multiple levels of description, increasing the reliability of their conclusions. I think this work is solid and has the potential to be influential in future studies of myelin alterations (and related disorders such as multiple sclerosis).

      We thank the reviewer for these positive comments on our manuscript.

      Weaknesses:

      In its current form, the study still presents several issues which prevent it from achieving a higher potential impact. These can be summarized in two main items. First, the manuscript is missing some important details about how demyelination and remyelination are incorporated in both models (and what is the connection between both implementations). For example, it is unclear whether an unperturbed axon and a fully remyelinated axon would be mathematically equivalent in the multicompartment model, or how the changes in the number of nodes, myelin lamella, etc, are implemented in the spiking neural network model.

      We thank the reviewer for these suggestions to improve the clarity of our manuscript. A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon: it has shorter and thinner myelinated segments, and additional nodes in between. This is consistent with empirical observations in rhesus monkey dlPFC, as reviewed in Peters et al. (2009): a 90% increase in paranode profiles, and myelin sheaths that were thinner than expected for the size of the enclosed axon. With no empirical observations of fewer numbers of nodes (but rather, the opposite) or bare sections of axon, we assumed that the remyelination process also creates new nodes (which are identical to existing nodes), as also modeled in Scurfield & Latimer (2018). We have added two new sentences to the results to clarify this fact, before presenting the first set of results for the single cell model: (starting at line 137):

      “To simulate demyelination, we removed lamellae from selected myelinated segments; for remyelination we replaced a fraction of myelinated segments by two shorter and thinner segments with a node in between. As such, a ‘fully remyelinated axon’ had all the demyelinated segments subsequently remyelinated, but with fewer lamellae and additional nodes compared to the unperturbed control case, consistent with empirical observations (Peters, 2009).”

      We also state the maximal amount of remyelination more explicitly in the Results, starting on lines 164-165: "We next examined the extent to which remyelination with shorter and thinner segments, occurring after demyelination, restored axonal AP propagation (Figure 4).”

      Also on line 192-193: “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%).”

      Finally, in Methods we also clarified the structure of the added node (starting at line 634): “Remyelination was performed by replacing an affected (previously demyelinated) segment with two shorter segments, each including paranodes, juxtaparanodes, and an internode, and a new node between them that was identical to existing nodes.”

      We have also provided further details describing how myelin dystrophy was simulated in the network model in Results (lines 243 - 249) and in Methods (lines 722 - 747). How myelin alterations have been implemented in the network model is one of the questions of the reviewer (Question 5 in Reviewer #1: Recommendations for the Authors_)._ We have addressed this question by describing in detail how we adjusted CV and AP failure rate to the values produced by the multicompartment neuron model. Please see our answer to Question 5 for the details.

      Second, it is unclear whether some of the conclusions are strong computational predictions or just a consequence of the model chosen. For example, the lack of effect of decreasing the conduction velocity on working memory performance could be due to the choice of considering a certain type of working memory model (continuous attractor), and therefore be absent under other valid assumptions (i.e. a silent working memory model, which has a higher dependence on temporal synaptic dynamics).

      Whether some conclusions are strong predictions or just a consequence of the model chosen is an important concern and indeed a general problem of computational modeling of working memory. For example, Stein et al. (Stein et al. Towards biologically constrained attractor models of schizophrenia, Curr. Opin. Neurobiol. 2021) showed that opposed manipulations of E/I ratio can produce the same behavioral pattern in different alternative, plausible biological network models. As long as we do not fully understand the neural mechanisms underlying working memory, modeling studies of how alterations (e.g. in E/I ratio or in the reliability and timing of axonal transmission, as we did here) affect circuit function need to be interpreted critically and tested against new experimental data.

      One way to strengthen model predictions is by showing that different computational models make similar predictions. To do this, we implemented an activity-silent working memory model for continuous features, as suggested by the reviewer, and we found qualitatively similar effects as in our bump attractor network model. Thus, our main conclusions do not critically depend on the exact working memory mechanism (active vs. activity-silent).

      In the revised manuscript, we have added two new supplementary figures (Supplementary Figure 8 and 9, see the next page) and a new paragraph in the Results section about activity silent working memory (starting at line 319):

      “Alternative working memory mechanisms. Working memory in our neural network is maintained in an attractor state with persistent neural activity (Compte et al., 2000; Hansel and Mato, 2013). Other mechanisms have been proposed, including that working memory maintenance may rely on activity-silent memory traces (Mongillo et al., 2008; Stokes, 2015; Barbosa et al., 2020). In activity-silent models, a slowly decaying transient of synaptic efficacy preserves information without the need for persistent ongoing activity. We implemented an activity-silent model, to our knowledge the first one for continuous spatial locations, and tested how working memory performance is affected by AP failures and propagation delays. We found that AP failures corresponding to demyelination caused working memory errors qualitatively similar to the delay-active network (Supplementary Figure 8). On the other hand, increasing propagation delays did not lead to additional working memory errors, unless we include unrealistically high values (uniform distribution in the range of 0 to 100 ms; Supplementary Figure 9). These results are qualitatively similar to the delay active network model. Thus, our main findings do not critically depend on the exact working memory mechanism (active vs. activity-silent).”

      Author response image 1.

      Action potential failures impair working memory performance in a network model with activity-silent memory traces. (A) Spiking and synaptic activity in an unperturbed, activity-silent working memory model. Top: Raster plot showing the activity for each excitatory neuron (labeled by its preferred direction) in a single trial with a cue stimulus presented at 180°. We modified our spiking neural network model such that it does not show elevated persistent firing throughout the delay period (see Figure 5B for comparison). In particular, we reduced the external background input to excitatory neurons by a factor of 3.61% and we increased the cue stimulus amplitude by 12.5%. Even though spiking activity decays to baseline (close to 0 Hz), a memory trace is imprinted in enhanced synaptic strength due to short-term synaptic facilitation (Mongillo et al., 2008). Selective spiking activity is recovered by a non-selective constant input applied during 300 ms to all excitatory neurons during the two reactivation periods (marked by yellow and green rectangles in the raster plot). The amplitude of the input was 11 mV during the first and 13 mV during the second reactivation period. Reactivation periods are marked in light gray shading in the remaining panels below and the cue period is indicated by dark gray shading. Firing rates (second row), synaptic facilitation variable u (third row), and synaptic depression variable x (bottom row) for the same trial, averaged for 500 neurons around the neuron with 180° as preferred direction (solid lines) and around the neuron with 0° as preferred direction (dashed lines). Note that reactivation recovers the activity bump (C) but also causes elevated firing and subsequent enhancement of synapses at all positions in the networks. (B) Activity in a network with demyelination of 50% of the myelinated segments by removing 60% of the myelin lamellae. AP failures lead to reduced firing rates in the cue and early delay periods and consequently to weaker synaptic enhancement. (C) Average spike counts of the excitatory neurons during the cue period (black lines), and the two reactivation periods indicated in the raster plots in A and B (yellow and green lines). Solid lines correspond to the control network and dashed lines to the perturbed network. (D) Memory strength as a function of time for the control and perturbed networks. (E-F) Trajectories of the bump center (i.e., remembered cue location) read out from the neural activity across the cue and delay periods using a population vector (see Methods). Cue position was 180° in all trials. The perturbed network (F) shows larger working memory errors towards the end of the delay period compared to the control network (E).

      Author response image 2.

      Effect of propagation delays on control and perturbed activity-silent network models. (A) Memory strength during the whole simulation time for the young, control networks relying on activity-silent working memory (Supplementary Figure 8) with zero propagation delays (blue line), and with propagation delays from a uniform distribution with a range between 0 and 40 ms (yellow line) and between 0 and 100 ms (orange line). (B) Memory strength for perturbed networks when demyelinating 25% of the myelinated segments by removing 50% of the myelin lamellae, without delays (red line), and with uniformly distributed delays between 0 and 40 ms (light gray line) and between 0 and 100 ms (black line). The cue period is indicated by dark gray shading and reactivation periods are marked in light gray. Memory strength was calculated by averaging across 280 trials for one network. Shaded areas indicate SEM for each case. For the young, control networks (A), working memory was not affected by including delays of up to 40 ms. Unrealistically long delays ranging up to 100 ms did cause an impairment (the longest delays found for the most extreme perturbation condition – demyelination of 75% of the segments by removing 100% of the myelin lamellae – were of 49.9 ms on average). When also incorporating AP failures to the networks (B), we observed a similar trend. For this perturbation condition, delays of up to 40 ms were already much larger than the delays quantified in the single neuron model (for the case of 25% of the segments demyelinated by removing 50% of the myelin lamellae, the average delay in the cohort was 3.75 ms).

      With additional simulations to address these issues, I consider that the present study would become a convincing milestone in the computational modeling of myelin-related models, and an important study in the field of working memory.

      Again, we would like to thank the reviewer for the positive comments. We have addressed all the main issues raised (see below our response to the “recommendations for the authors”).

      Reviewer #2 (Public Review):

      This paper analyzes the effect of axon de-myelination and re-myelination on action potential speed, and propagation failure. Next, the findings are then incorporated in a standard spiking ring attractor model of working memory.

      I think the results are not very surprising or solid and there are issues with method and presentation.

      The authors did many simulations with random parameters, then averaged the result, and found for instance that the Conduction Velocity drops in demyelination. It gives the reader little insight into what is really going on. My personal preference is for a well understood simple model rather than a poorly understood complex model. The link between the model outcome of WM and data remains qualitative, and is further weakened by the existence of known other age-related effects in PFC circuits.

      We thank the reviewer for the critical assessment of our work. We share the view that understanding simple models can provide critical insights into brain function (and we believe that many of our papers related to attractor dynamics in working memory and decision making fall into this category, e.g. Wimmer et al. 2014, Esnaola-Acebes et al. 2022, Ibañez et al 2020). However, we respectfully disagree with the reviewer on an important point: the model complexity that we have chosen is appropriate and necessary to study the phenomenon at hand. Our modeling efforts are principled, with complexity added as necessary. We started with a biophysical single neuron model with firing dynamics fit to empirical data in pyramidal neurons of rhesus monkey dlPFC (Rumbell et al. 2016) – the same type of neurons and cortical region analyzed in the Peters et al. work on structural changes to myelin seen during aging (e.g., Figure 1). Because simple models do not accurately capture the CV along thin axons like those in the PFC, we attached a multicompartment axon with detailed myelinated segments, and constructed a cohort of feasible models. We then used this cohort to get quantitative estimates of the effects of variable degrees of demyelination and remyelination. This would not be possible with a simpler model. We then study the consequences of de- and re-myelination in a spiking neural network model. Again, we could not use a simpler model (e.g. a firing rate attractor model) without making gross assumptions about how demyelination affects circuit function. In sum, we believe that our models are relatively simple but comprehensive given the phenomenon that we are studying.

      The reviewer is correct in that there exist “known other age-related effects in PFC circuits”. These are reviewed in the introduction and we discuss future extensions of our model that would incorporate those effects as well. It is important to note that this is the first comprehensive study of demyelination effects in aging PFC, demonstrating that myelin changes alone predict working memory changes associated with aging.

      The specific issues about modeling choices and interpretation of the results are discussed below.

      Both for the de/re myelination the spatial patterns are fully random. Why is this justified?

      We agree that myelin dystrophy during aging could be non-random, that is, localized to certain regions of an axon. Our collaborators (Drs Jennifer Luebke, Maya Medalla, and Patrick Hof) are currently addressing this question using 3D electron microscopy and immunohistochemistry on axons of individual neurons and their associated myelin, but results are not available yet. Early on in this study we examined how the location of myelin alterations affected AP propagation. Focusing demyelination along a section of axon led to more AP slowing and failure than when spatially randomized. Likewise, remyelination of such spatially localized dystrophy led to greater recovery, as there were fewer transitions between long and short internodes (Supplemental Figure 4). Since otherwise the effects in the localized cases were largely similar to those in the spatially random case (see Author response image 3 below), for brevity in this paper we assumed myelin alterations were randomly distributed. Our next paper, extending this study to collateralized axons and which was presented as a poster at the 2023 Society for Neuroscience meeting, will include an examination of localized myelin dystrophy.

      Author response image 3.

      Effect of localized myelin alterations on CV change. Myelin alterations were either focused on the third of myelinated segments closest to the initial segment (‘proximally clustered’), the third of myelinated segments furthest from the initial segment (‘distally clustered’), or distributed according to a uniform distribution as in the current study. For demyelination, all lamellae were removed from 25% of myelinated segments (showing mean +/- SEM of all 50 cohort models, 30 randomized trials each). For remyelination, affected segments were replaced by two shorter segments with 75% of the original lamellae thickness and a node in between.

      We have added two sentences in Methods to justify this assumption more clearly (line 510): “Evidence suggests that aging affects oligodendrocytes in several ways, including the ability for oligodendrocyte precursor cells to mature (Dimovasili et al., 2022). Knowing that individual oligodendrocytes myelinate axons of many different neurons, but without data quantifying how oligodendrocyte dystrophy affects myelination in individual axons, we assumed that myelin alterations were randomly distributed.”

      We have also added a sentence in the Discussion alluding to our upcoming study (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      Similarly, to model the myelin parameters were drawn from uniform distributions, Table 1 (I guess). Again, why is this reasonable?

      The reviewer is correct that our initial Latin hypercube sample generated a uniform distribution. However, parameters of the random sample of models selected as biologically feasible were not uniformly distributed. We have added a new figure (Supplementary Figure 1A) to illustrate the parameter distributions, and have added two sentences in Methods (starting on line 596):

      “Of the 1600 simulated models, 138 met these criteria; for the present study, we randomly selected 50 models to comprise the young, control model cohort. Along most dimensions, the chosen cohort was approximately normally distributed (Supplementary Figure 1). The g-ratio (ratio of axon to fiber diameter) among models in the cohort was 0.71 ± 0.02, with total axon lengths of 1.2 ± 0.1 cm.”

      Author response image 4.

      Distribution of parameters and conduction velocities in the single neuron model cohort. (A) Histograms of axon morphology parameters of models selected for the single neuron cohort. Top: axon diameter: middle, length of unperturbed myelin segments; bottom: total myelin thickness in unperturbed segments, computed as the product of lamella thickness and number of lamellae. (B) Histograms of the CV for the 50 axons of the unperturbed model cohort (top), and representative demyelination and remyelination perturbations: mild demyelination (removing 25% of lamellae from 25% of the myelinated segments, second row); severe demyelination (removing all lamellae from 75% of the myelinated segments, third row); and complete (100%) remyelination (where the demyelinated segments from the third row were remyelinated by two shorter segments with 75% of lamellae). CVs averaged over 30 trials in each case. (C) Changes in CV (measured in %) in response to demyelination and remyelination versus the magnitude of current clamp step (+180, +280, or +380 pA). Shown are mean +/- SEM for demyelinating 50% of myelinated segments (removing all lamellae), and subsequent remyelination of those segments by shorter segments with 75% of lamellae.

      The focus of most analysis is on the conduction velocity but in the end, this has no effect on WM, so the discussion of CV remains sterile.

      CV delays likely do affect brain functions that rely on neuronal oscillations and synchrony, as mentioned in the Discussion. As such, we feel that our single neuron model results on CV delays as well as AP failures are valuable for the scientific community. Yet, given the results of our network models here, the reviewer has a valid point. We have clarified in the introduction that AP failures but not CV delays affected the network output (line 115):

      “Higher degrees of demyelination led to slower propagation and eventual failure of APs along the axons of the multicompartment models. In the network models, an increase in AP failure rate resulted in progressive working memory impairment, whereas slower conduction velocities, in the range observed in the multicompartment models, had a negligible effect.”

      We have also revised the single neuron section of the Results throughout, to better highlight the effects of myelin dystrophy on AP failures. Revisions to address this in the demyelination section start on line 148:

      “AP propagation was progressively impaired as demyelination increased (Figure 3): CV became slower, eventually leading to AP failure. Removing 25% of lamellae had a negligible effect on CV, regardless of how many segments were affected. However, when all lamellae were removed, CV slowed drastically – by 38 ± 10% even when just 25% of the segments were demyelinated in this way, and 35 ± 13% of APs failed. When 75% of segments lost all their lamellae, CV slowed by 72 ± 8% and 45 ± 13% of APs failed.”

      Similiarly, we have added several sentences about AP failures that remain after remyelination of the single neuron model (starting on line 190):

      “Results for the percentage of AP failures (Figure 4C,F) were consistent with those for CV recovery. Remyelinating all previously demyelinated segments, even adding just 10% of lamellae, brought AP failure rates down to 14.6 ± 5.1%. Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%). Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow). AP failure rates were slightly lower when starting with partial demyelination: 10.6 ± 7.6% of APs failed in the analogous paradigm (Figure 4F, red dashed line and arrow). In short: combinations of demyelinated and remyelinated segments often led to sizable CV delays and AP failures.”

      The more important effect of de/re myelination is on failure. However, the failure is, AFAIK, just characterized by a constant current injection of 380pA. From Fig 2 it seems however that the first spike is particularly susceptible to failure. In other words, it has not been justified that it is fine to use the failure rates from this artificial protocol in the I&F model. I would expect the temporal current trace to affect whether the propagation fails or not.

      In general, we did not find the first spike to be more susceptible to failure than latter spikes; the trace in Figure 2 is a representative snapshot intended to illustrate CV slowdown, AP failure, and recovery. Regarding the constant current injection: while the reviewer is correct that neurons do not receive such inputs in vivo, the applied current injections were designed to match in vitro current clamp protocols for these rhesus monkey neurons. While our future studies will include responses to more realistic synaptic inputs, we focused on somatic current injections here. We have added a new panel (C) to Supplementary Figure 1 (see previous response above) showing that the current step magnitude had little effect on the CV change after myelin perturbations; there was little effect on AP failure rates too. We now also state this finding more explicitly in Methods (starting on line 561):

      “As done during in vitro electrophysiological experiments (Chang et al., 2005; Ibanez et al., 2020) and past modeling studies (Coskren et al., 2015; Rumbell et al., 2016), we first applied a holding current to stabilize the somatic membrane potential at -70 mV, then injected a current step into the somatic compartment for 2 seconds. …The CV changes in response to myelin alterations were relatively insensitive to variations in the magnitude of suprathreshold somatic current steps (Supplementary Figure 1C), and whether the current was constant or included Gaussian noise. Therefore, here we quantified CV changes and AP failures from responses to constant +380 pA current steps only.”

      I don't know if there are many axon-collaterals in the WM circuits and or distance dependence in the connectivity, but if so, then the current implementation of failure would be questionable.

      We agree that axon collaterals may affect our results; our unpublished morphological analyses of individual neuron axons indicate that there is a high degree of local axon collateralization in Layer 3 pyramidal neurons in LPFC. In this first study from our group on myelin perturbations, we chose to focus here on unbranched axons. There was some distance dependence of AP failure along the length of the axon. For example, in our most extreme demyelination case (75% of segments losing all their lamellae), about 14% of the axons showed more AP failure at their distal ends relative to the middle (mean difference 6.33%). We are examining this distance dependence more broadly in our next study, now cited in the Discussion (line 434): “Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures.”

      I would also advise against thresholding at 75% failure in Fig3C. Why don't the authors not simply plot the failure rate?

      We thank the reviewer for this suggestion, and have made this change. As suggested by the reviewer, we now show the AP failure rate in Figure 3 and Figure 4. The trends shown are nearly identical to those from the high failure trials.

      Regarding the presentation, there are a number of dead-end results that are not used further on. The paper is rather extensive, and it would be clearer if written up in half the space. In addition, much information is really supplementary. The issue of the CV I already mentioned, also the Lasso regression for instance remains unused.

      We understand the reviewer’s perspective, and we do value brevity when possible. During the revision process we examined the paper carefully, and made things more concise when it was feasible. As mentioned above, reporting CV results is important, though these revisions increased emphasis on results for AP failures in our revision. We combined the two Supplementary Figures about remyelination in the single neuron model into one (Supplementary Figure 3). We also moved the Lasso figure and associated methods to the Supplementary Material (Supplementary Figure 2), and have separated the Lasso results for demyelination and remyelination into their respective paragraphs (lines 154-160 and lines 200-204 respectively). While we do not use the Lasso explicitly later in Results, we cite them in the Discussion when comparing our findings to previous work (starting on line 417):

      “Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019).”

      We hope that our revision has struck an appropriate balance between clear and concise writing, and addressing concerns from both reviewers. We greatly value the time you have given to help us to improve our manuscript.

      Response to Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      As I mentioned above, I consider that this study is well designed and it offers very interesting results. I have detailed below some of the issues that should be addressed to improve its potential impact in the field:

      (1) Across the manuscript, it is not entirely clear how the results of the multicompartmental model compare to existing modeling results on demyelination and CV changes (such as in the papers cited by the authors). Is this section confirming previous results with a new (more accurate) computational model, or are there any new insights previously unreported? A new paragraph in the Discussion putting these results in context would be very useful for the reader.

      We thank the reviewer for this suggestion. We have added two new subheadings to organize the Discussion better, and have expanded the single neuron section to three paragraphs. We feel this now clarifies how our model fits in with previous work while stating its novelty more explicitly. Starting on line 391:

      “Myelin changes affect AP propagation in a cohort of model neurons

      The novelty of our neuron model lies in its systematic exploration of a combination of different myelin perturbation types known to occur in myelin dystrophies, across a wide range of biologically feasible models. Our single neuron model assumed that age-related myelin dystrophies (e.g., Figure 1) alter the insulative properties of lamellae analogously to demyelination, and examined interactions between demyelination and remyelination. Past studies of myelin dystrophy examined how either demyelination or remyelination of all segments affected AP propagation for a few representative axon morphologies. For example, Scurfield and Latimer (2018) explored how remyelination affected CV delays, finding that axons with more transitions between long and short myelinated segments had slower CV (Supplementary Figure 4), and was first to explore how remyelination interacts with tight junctions. However, their study did not couple remyelination and demyelination together or examine AP failures. Other basic findings from our single neuron cohort are consistent with past modeling studies, including that demyelination caused CV slowing and eventual AP failures (Stephanova et al., 2005; Stephanova and Daskalova, 2008; Naud and Longtin, 2019), and, separately, that remyelination with shorter and thinner myelinated segments led to CV slowing (Lasiene et al., 2008; Powers et al., 2012; Scurfield and Latimer, 2018). However, by assuming that some previously demyelinated segments were remyelinated while others were not, we found that models could have much higher AP failure rates than previously reported. Such a scenario, in which individual axons have some segments that are normal, some demyelinated, and some remyelinated, is likely to occur. We also found a few neurons in our cohort showing a CV increase after remyelination, which has not generally been reported before and is likely due to an interplay between ion channels in the new nodes and altered electrotonic lengths in the perturbed myelinated segments (e.g., Waxman, 1978; Naud and Longtin, 2019).

      Since our single neuron cohort sampled a wide range of parameter space, we used Lasso regression to identify which of the complex, interacting parameters contributed most to CV delays (which preceded AP failures). Parameters including axon diameter, node length, length of myelinated segments, and nodal ion channel densities predicted how our models responded to demyelination and remyelination; these findings are consistent with past modeling studies over more limited parameter ranges (e.g., Goldman and Albus, 1968; Moore et al., 1978; Babbs and Shi, 2013; Young et al., 2013; Schmidt and Knösche, 2019). Better empirical measurements of these parameters in monkey dlPFC, for example from 3-dimensional electron microscopy studies or single neuron axon studies combined with markers for myelin, would help predict the extent to which myelin dystrophy and remyelination along individual axons with aging affect AP propagation.

      Another important feature of our multicompartment model is that it was constrained by morphologic and physiological data in rhesus monkey dlPFC —an extremely valuable dataset from an animal model with many similarities to humans (Upright and Baxter, 2021; Tarantal et al., 2022). While beyond the scope of the current study, this computational infrastructure –with a detailed axon, initial segment, soma, and apical and basal dendrites– enables simultaneous investigations of signal propagation through the dendritic arbor and axon. Our model can also be extended to explore interactions between spatially localized myelin perturbations (such as those seen in multiple sclerosis) and axon collateralization (Sengupta et al., 2023), which would affect the distance-dependence of AP failures. Integrating such results from single neuron models into network models of working memory, as we have done here, is a powerful way to connect empirical data across multiple scales.”

      (2) Although the authors provide a well-designed study for the multi-compartmental model, it would be useful to add more details about how an unperturbed model and a completely remyelinated model differ in practice, perhaps right before the first results on the single cell model are presented. Are the new myelin sheaths covering the same % of axon as in the original case? Are there the same number of nodes? It is hard to distinguish which of these results are due to a compensation by the new myelin sheaths and which ones are just the model coming back to its original (and mathematically equivalent) starting point.

      A ‘fully remyelinated’ axon is not mathematically equivalent to the unperturbed axon. Newly remyelinated segments had at most 75% of the original number of myelin wraps, with a new node in between, consistent with empirical observations in rhesus monkey dlPFC. Our manuscript changes in response to this recommendation are described in detail above in our response to the public review of the same reviewer.

      (3) The authors observe a directed component in the bias that is known to be caused by heterogeneities in network connectivity, as stated in the text. It occurs to me that similar effects could be also caused by an heterogeneous demyelination in parts of the network. Inducing these biases could be another potential effect of demyelination in practice, and could be easily revealed by the author's current model (and displayed in a supplementary figure).

      As suggested by the reviewer, we have tested heterogeneous demyelination in parts of the network and the results confirm the reviewer’s intuition. We have included these new results as new Supplementary Figure 7 (see below) and we have added the following sentences in the Legend of Figure 5, line 1265: “When demyelination is restricted to a part of the network, diffusion only increases in the perturbed zone (Supplementary Figure 7).” and in the Discussion (line 457): “In addition to age-related changes in memory duration and precision, our network model predicts an age-related increase in systematic errors (bias) due to an increased drift of the activity bump (Supplementary Figure 11). Moreover, if demyelination is spatially localized in a part of the network, the model predicts a repulsive bias away from the memories encoded in the affected zone (Supplementary Figure 7).”

      Author response image 5.

      Effect of spatially heterogeneous demyelination of the model neurons according to their preferred angle. We also tested working memory performance in the network when demyelination affects only parts of the network. The figure shows the decoded bump center position during the cue and delay period for the eight possible cue directions when a fraction of neurons was perturbed and the rest of the neurons in the circuit were unaltered (Figure 5B). We perturbed 10% of the neurons around the neuron with preferred direction 90° (left panel), 25% of the neurons around -90° (middle panel), and 50% of the neurons around 180° (right panel). Bump traces for cues that lie inside the perturbed portion of the circuit are shown in blue. Network perturbation in the three cases consisted in demyelinating 25% of the segments along the axons of model neurons, by removing 70% of the myelin lamellae. In each case, 280 trials were simulated for one network. These simulations show an increased drift and diffusion inside the perturbed zone, consistent with the increased drift and diffusion when perturbing the entire network (Figure 6B and Supplementary Figure 11). In particular, spatially heterogeneous demyelination in our network leads to a bias away from the affected zone and to increased trial-to-trial variability. Note that this is a model prediction, but we are not aware of empirical data showing heterogeneous demyelination with aging. Further, note that while our network model has a topological ring structure, neurons in PFC are not anatomically arranged depending on their preferred features. Thus, spatially heterogeneous demyelination would likely affect neurons with different feature preferences (i.e., neurons throughout our ring model).

      (4) The bump attractor model of WM relies on a continuous attractor dynamics to encode the information stored in memory --a fixed point dynamics that can only vary via the slow noise-driven drift. This means, as the authors mention, that changes in CV won't affect the performance of WM in their model. This seems to be a limitation of the model, or at least an effect which is highly dependent on the modeler's choice, rather than an accurate prediction. While testing the effects of oscillations (as the authors argue in the Discussion) might be out of the scope of this work, there are other WM models which are more sensitive to temporal differences in activity. The authors should test whether the same (lack of) effects are also found in other WM models. A silent WM model seems to be the ideal candidate for this, as the authors already have the key dynamics of that model incorporated in their computational framework (namely, short-term synaptic facilitation in excitatory synapses).

      We fully agree that considering the effects of demyelination in networks with alternative mechanisms would strengthen our manuscript. As suggested by the reviewer, we have simulated demyelination effects (AP failures and changes in CV) in an activity silent working memory model. The results are described in detail above in our response to the public review of the same reviewer.

      We also would like to mention that we have now also tested larger conduction delays in the bump attractor model, revealing additional working memory errors. This is shown in the revised version of Supplementary Figure 6 (see below). However, those delays are unrealistically large and thus the main effect in both the bump attractor and the activity-silent model is due to AP failures.

      Author response image 6.

      Effect of propagation delays on control and perturbed networks. (A) Memory strength (left panels) and diffusion (right panels) for the young, control networks with zero propagation delays (blue solid line), as in Figure 5, and with propagation delays from a uniform distribution with a range between 0 and 100 ms (yellow dashed line). (B) Memory strength and diffusion for perturbed networks when demyelinating 50% of the segments along the axons of model neurons, by removing 60% of the myelin lamellae without delays (red solid line), and with delays from a uniform distribution with a range between 0 and 40 ms (gray dashed line) and between 0 and 85 ms (black dash-dotted line). The measures of working memory performance were calculated by averaging across 20 networks and 280 trials for each network. Shaded areas indicate SEM for each case. For the young, control networks, there was no difference with and without propagation delays, even though the delays used in the network simulations were much larger than the delays quantified in the single neuron model (the longest delays found for the most extreme perturbation condition –demyelination of 75% of the segments by removing 100% of the myelin lamellae– were of 49.9 ms on average; A). Working memory performance was also unaffected in the perturbed network with AP failures for delays ranging between 0 and 40 ms, also larger than the ones quantified in the single neuron model (for the case of 50% of the segments demyelinated by removing 60% of the myelin lamellae, the average delay in the cohort was 4.6 ms and the maximum delay was 15.7 ms; B). However, including extremely long delays of up to 85 ms did further impair memory compared to the impairment level introduced by AP failures alone (B).

      (5) Impact of demyelination and remyelination on working memory: Could the authors explain here how these biologically detailed alterations are implemented in the bump attractor model? Is the CV and AP failure rate adjusted to the values produced by the multicompartment neuron model with these myelin alterations?

      Yes, the reviewer is right, the CV and AP failure rate have been adjusted to the values produced by the multicompartment neuron model. To clarify this in the manuscript, we have restated the text as follows:

      Lines 243 - 249 (Results):

      To investigate how myelin alterations affect working memory maintenance, we explored in the network model the same demyelination and remyelination conditions as we did in the single neuron model. Because our network model consists of point neurons (i.e., without detailed axons), we incorporated CV slowing as an effective increase in synaptic transmission delays (see Methods). To simulate AP failures, we adjusted the AP failure rate to the values given by the single neuron model, by creating a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons (see Methods).

      Lines 722 - 747 (Methods):

      Modeling action potential propagation failures in the network. The network model is composed of point neurons without an explicit model of the axon. To effectively model the action potential failures at the distal end of the axons quantified with the single neuron model under the different demyelination and remyelination conditions, the AP failure rate was adjusted to the values produced by the single neuron model. To do this, we perturbed the 10 control networks by designing a probabilistic model of spike transmission from the excitatory presynaptic neurons to both the excitatory and inhibitory postsynaptic neurons. From the single neuron model, for each demyelination/remyelination condition, we quantified the probability of AP failure for each of the neurons in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. That is, the percentage of neurons that had probability of failure = 0, probability of failure = 1 or any other probability. Then, we computed the probability of transmission, , and we specified for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      Modeling conduction velocity slowing in the network. To explore the effect of CV slowing along the axons of model neurons, we simulated 20 young, control networks and 20 perturbed networks with AP failure rates adjusted for the case of single model neurons with 50% of the segments demyelinated along the axons by removing 60% of the myelin lamellae (we ran 280 trials for each network). Then, we added random delays uniformly distributed with a minimum value of 0 ms in both cases, a maximum value of 100 ms in the control networks, and a maximum values of 40 ms and 85 ms in the perturbed networks, in both the AMPA and NMDA excitatory connections to both E and I neurons (Supplementary Figure 6). These large values were chosen because we wanted to illustrate the potential effect of CV slowing in our network and smaller, more realistic, values did not have any effect.

      (6) "We also sought to reveal the effect on working memory performance of more biologically realistic network models with AP transmission probabilities matched to both axons with intact and with altered myelin sheaths, as likely occurs in the aging brain (Figure 1). Thus, we ran network model simulations combining AP failure probabilities corresponding to groups of neurons containing intact axons and axons presenting different degrees of demyelination." I fail to see the difference with respect to the results in previous sections. Is it that now we have subnetworks in which axons are intact and subnetworks with significant AP failures, while before there was no topological separation between both cases? Please clarify.

      In Figures 5 and 6 the AP failure rate of the neural population in the network simulations was matched to the AP failure rate of the cohort of single model neurons for each demyelination/remyelination condition. Since not all model neurons have equal features, a given condition produces different levels of impairment in its neuron. Thus, we quantified the probability of AP failure for each neuron in the control cohort, as well as the percentage of those neurons that shared the same probabilities of failure. Then, we computed the probability of AP transmission for the corresponding percentages of excitatory neurons in the networks. Thus, in the network model, we took into account the heterogeneity observed in the single neuron model under each demyelination/remyelination condition.

      However, In Figures 7 and 8, we consider additional heterogeneity due to a different degree of demylination/remyelination of different neurons. Here, excitatory neurons in the network model are not perturbed according to a single demyelination/remyelination condition. Instead, we allowed that different percentages of excitatory neurons had AP failure rates corresponding to different demyelination/remyelination conditions: some were unperturbed, while others had different degrees of demyelination (Figure 7) and different degrees of remyelination (Figure 8). We have modified the text for clarification in several places.

      First, when we describe the impact of demyelination on working memory, we already mention that (line 271): “In each of the 10 networks, we set the AP failure rate of the excitatory neurons according to the distribution of failure probabilities of the neurons in the single neuron cohort for the given demyelination or remyelination condition. Thus, we took into account the heterogeneity of demyelination and remyelination effects from our single neuron cohort (Figure 3A; Supplementary Figure 3). Note that this heterogeneity originates from differences in axon properties, but probabilities of failure for all neurons in the network correspond to the same degree of demyelination (Figure 6). We will also consider networks that contain different combinations of axons with either intact or perturbed myelin (Figure 7 and Figure 8).”

      Second, we have combined the text describing Figures 7 and 8 under a single section title, which reads “Simulated heterogenous myelin alterations match empirical data” (line 334) and start this section with (line 337): “Up to this point we have studied network models with AP failure probabilities corresponding to a single degree of myelin alterations (i.e., with all excitatory neurons in the network having AP failure rates matched to those of the single neuron cohort for one particular demyelination or remyelination condition). Next, we sought to reveal the effect on working memory performance of more biologically realistic network models, where excitatory neurons in the networks were perturbed according to a combination of different demyelination or remyelination conditions. That is, we simulated networks with excitatory neurons having AP failure probabilities matched to both neuronal axons with intact and with altered myelin sheaths in different degrees, as likely occurs in the aging brain (Figure 1).”

      (7) "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance." This conclusion is quite interesting, but I lack intuition from the paper as of why it is happening. In fact, the authors say in the Discussion that "complete remyelination of all the previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, recovered working memory function." Would we then see a minimum and then an increase in memory duration in Figure 9B if we extended the X-axis until we hit 100% of new myelin sheaths?

      This is a very important question that we have carefully addressed in Results and Discussion. We distinguish between two remyelination cases in the models. Complete remyelination: when all (100%) the previously demyelinated segments have been subsequently remyelinated, and incomplete remyelination: when less than 100% (25%, 50% or 75%) of the demyelinated segments have been remyelinated. Figure 6 (middle and right columns) shows the two cases (black lines for any percentage of lamellae added vs. colored lines): for 100% of the segments remyelinated, the network performance is nearly or completely (when enough lamellae are added) recovered to the young network performance. In fact, with the single neuron model we observe that (lines 192 - 193 in Results): “Remyelinating all affected segments with 75% of lamellae (the maximal amount of remyelination) nearly eliminated AP failures (1.8 ± 1.1%)”. However, incomplete remyelination recovers the performance compared to demyelination (middle and right columns in Figure 6 vs left column), but this performance is worse than the performance of the young networks. The single neuron model shows that (lines 194 - 197 in Results): “Incomplete remyelination, where some segments were still demyelinated, still had relatively high AP failure rates. For example, when one eighth of segments were remyelinated with the maximal amount of lamellae and one eighth were left bare, 25.7 ± 11.5% of APs failed across the cohort (Figure 4C, red dashed line and arrow).”

      In Figure 9B (now Figure 8B), we combine intact axons with axons that are only partially remyelinated (i.e., incomplete remyelination). Extending the X-axis in Figure 8B until 100% of new myelin sheaths would not imply a minimum and a subsequent increase, but a continuous impairment: the more axons we perturb (remyelinate) the higher is the impairment compared to the young cases where all the axons are intact.

      The sentence "Unexpectedly, our model indicates that compared to the performance of networks composed of neurons possessing axons with intact myelin sheaths, both demyelination and remyelination leads to an impaired performance.", now reads as (lines 379 380 in Results): “Therefore, both demyelination and incomplete remyelination lead to impaired performance in our networks, compared to networks with intact myelin sheaths”. We have also rewritten the corresponding section in Discussion (lines 486 - 489) as follows: “Therefore, it is reasonable to assume that ineffective remyelination may lead to working memory impairment. In fact, complete remyelination of all previously demyelinated segments with sufficient myelin, with fewer transitions between long and short segments, led to full recovery of working memory function.”

      (8) [minor] "Our recent network model found that age-related changes in firing rates and synapse numbers in individual neurons can lead to working memory impairment (Ibañez et al., 2020), but did not consider myelin dystrophy." Could you be more precise about which age-related changes were studied in Ibanez et al. 2020? From the paper it seems like it was mostly cellular excitability and synaptic density, so this should be added here for more context.

      To clarify this, we have added the following sentences in the Introduccion (line 105):

      “Our recent network model revealed that the empirically observed age-related increase in AP firing rates in prefrontal pyramidal neurons (modeled through an increased slope of the f-I curve) and loss of up to 30% of both excitatory and inhibitory synapses (modeled as a decrease in connectivity strength) can lead to working memory impairment (Ibañez et al., 2020), but this model did not incorporate the known changes to myelin structure that occur during normal

      aging.”

      (9) [minor] "Recurrent excitatory synapses are facilitating, which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons." It would be great to add a reference here to justify the inclusion of this type of plasticity in the excitatory circuit (for example Wang, Markram et al. Nat Neuro 2006).

      We have added the references suggested by the reviewer and a further one in the Results (line 216):

      “Recurrent excitatory synapses are facilitating, as has been empirically observed in PFC (Hempel et al., 2000; Wang et al., 2006), which promotes robust and reliable persistent activity despite spatial heterogeneities in the connectivity or in the intrinsic properties of the neurons.”

      References:

      Hempel, C. M., Hartman, K. H., Wang, X. J., Turrigiano, G. G., and Nelson, S. B. (2000). Multiple forms of short-term plasticity at excitatory synapses in rat medial prefrontal cortex. J. Neurophysiol. 83, 3031–3041. doi: 10.1152/jn.2000.83.5.3031

      Wang, Y., Markram, H., Goodman, P. H., Berger, T. K., Ma, J., and Goldman- Rakic, P. S.(2006). Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat.Neurosci. 9, 534–542. doi: 10.1038/nn1670

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers’ Comments:

      Reviewer #1 (Remarks to the Author):

      Summary:

      Fang Huang et al found that RBM7 deficiency promotes metastasis by coordinating MFGE8 splicing switch and NF-kB pathway in breast cancer by utilizing clinical samples as well as cell and tail vein injection models.

      Strengths:

      This study uncovers a previously uncharacterized role of MFGE8 splicing alteration in breast cancer metastasis, and provides evidence supporting RBM7 function in splicing regulation. These findings facilitate the mechanistic understanding of how splicing dysregulation contributes to metastasis in cancer, a direction that has increasingly drawn attention recently, and provides a potentially new prognostic and therapeutic target for breast cancer.

      We thank the reviewer for appreciating the novelty and importance of this study, and have provided new data to address the following concerns raised by the reviewer.

      Weaknesses:

      This study can be strengthened in several aspects by additional experiments or at least by further discussions. First, how RBM7 regulates NF-kB, and how it coordinates splicing and canonical function as a component of NEXT complex should be clarified. Second, although the roles of MFGE8 splicing isoforms in cell migration and invasion have been demonstrated in transwell and wound healing assays, it would be more convincing to explore their roles in vivo such as the tail vein injection model. Third, the clinical significance would be considerably improved, if the therapeutic value of targeting MFGE8 splicing could be demonstrated.

      We’re thankful for the constructive suggestions. A preliminary study on the mechanism by which RBM7 regulates NF-kB pathway is already underway. We found RBM7 depletion remarkably promoted the expression of IL-1β as judged by qPCR and ELISA assays (new Figure S5G- S5I, also see below). IL-1β, commonly known as a pro-inflammatory cytokine, could bind to IL-1R and initiate a multistage enzymatic reaction that triggers the activation of NF-κB pathway (Axel Weber, 2010) (Qing Guo, 2024). Thus we speculated that the upregulation of IL-1β might be a causal factor in RBM7-depletion-induced activation of NF-kB signaling. It will be interesting to determine the complete molecular mechanism in our future study. In addition, we performed a co-IP experiment and found that RBM7 could interact with RNA splicing factor SF3B2, a component of spliceosomal U2 snRNP complex (new Figure S6B, also see below). Consistent with the AS regulation of MFGE8 by RBM7, the depletion of SF3B2 also promoted exon7 skipping, implying a cooperative effect of the two proteins in regulating MFGE8 splicing (new Figure S6C-6D, also see below). This is in concert with a previous study that RRM domain of RBM7 could bind a proline-rich segment within SF3B2 (Falk, Finogenova et al., 2016). The interaction mode with strong similarity to RBM7RRM–ZCCHC8Proline interaction in the NEXT complex indicated mutually exclusive binding of SF3B2 and ZCCHC8 to RBM7. Thus, RBM7 appears to play dual, but not conflicting, roles during RNA processes depending on its interaction with the spliceosome or exosome (see line 427-437 in the new manuscript).

      Author response image 1.

      The mRNA levels of IL-1β in MDA-MB-231 or BT549 cells with stable RBM7 knockdown or control vector were examined by qRT-PCR approach.

      Author response image 2.

      Supernatants from RBM7-knockdown MDA-MB-231 or BT549 cells were collected and protein expression of IL-1β was measured by ELISA kit.

      Author response image 3.

      The knockdown efficiency of RBM7 in two breast cancer cell lines were determined by qRT-PCR approach.

      Author response image 4.

      Immunoprecipitation assay was performed in breast cancer cells expressing HA-RBM7 and Flag-SF3B2 or empty vector. The Flag-tagged precipitated complexes and lysates were analyzed through western blotting.

      Author response image 5.

      The splicing shift of MFGE8 upon SF3B2 knockdown in breast cancer cells was examined by RT-PCR approach. The mean ± SD of PSI values derived from three independent replicates is shown.

      Author response image 6.

      The SF3B2 knockdown efficiency was examined by qRT-PCR.

      To further corroborate the roles of two MFGE8 isoforms in cell invasion, we have performed Fluorescent Gelatin Degradation Assays for investigating invadopodia formation. Consistent with the transwell assay results, MFGE8-L up-regulation suppressed breast cancer cells invasion through a layer of extracellular matrix, whereas breast cancer cells with ectopic expression of MFGE8-S acquired enhanced ability to degrade matrix and invasion (new Figure 5B, also see below). In addition, to determine the therapeutic value of targeting MFGE8 splicing, we transfected triple-negative breast cancer cells with ASOs targeting RBM7-binding motif and examined the potential impact on cell aggressiveness. The results showed an obvious increase in exon7-skipped variant of MFGE8 as compared to the scramble negative control ASOs, meanwhile, the migrative and invasive ability of breast cancer cells treated with splice-targeting ASOs was significantly boosted (new Figure 6B and S5B, also see below), further suggesting that RBM7-knockdown stimulated aggressiveness of breast cancer at least partially relies on splicing switch of MFGE8.

      Author response image 7.

      Gelatin degradation assay was performed to test the effect of RBM7 knockdown on invadopodia function. 10000 cells were plated onto FITC-gelatin substrates (Green) and cultured for 48 h. Representative images are shown (red, Cy3-phalloidin; blue, DAPI) and the degraded areas were quantified by Image J software. Scar bars= 50 μm. P values were determined by one-way ANOVA with Tukey's multiple comparison test (n = 3).

      Author response image 8.

      Representative transwell analysis of migrative/invasive capability of breast cancer cells transfected with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 9.

      RT-PCR quantification of two MFGE8 isoforms after transfecting breast cancer cells with 500 nM ASO directed against RBM7-binding region in MFGE8 pre-mRNA. P values were calculated by one-way ANOVA with Tukey's multiple comparison test.

      The minor concerns

      (1) Several figure legends do not match with the images, for example, Figure 2K, Figure 4, Figure 7D, and 7E, and the description of Fiure 7F is missing in the text.

      As suggested by the reviewer, we have checked all of the figure legends carefully and corrected all of the misinterpretation.

      (2) The statistical methods for Figure1A and Figure1B should be indicated.

      As suggested by the reviewer, we have included the statistical methods for Figure1A and 1B in Figure1 legend. Data in Figure 1A and 1B are presented as means ± SD and P values were obtained by Mantel-Cox log-rank test.

      (3) The molecular weight of the proteins in the Western Blot images should be marked.

      As suggested by the reviewer, we have added the molecular weight of proteins in all of the western blot images.

      (4) The sequences where RBM7 binds on MFGE8 RNA should be clearly indicated.

      We thank the reviewer for this question. We analyzed the sequence of alternative exon 7 and the motifs nearby its 5’ or 3’ splice sites, and found two RBM7 potentially binding motifs are positioned in proximal to the pseudo 3’ splice site. Subsequent RT-PCR for the precipitation in RIP assays confirmed RBM7 could bind to the upstream sequence containing 5’-UUUCUU-3’ motifs adjacent to intron6/exon7 junction of MFGE8 cassette exon, but not another region nearby it. To pinpoint the location for the potential cis-element for AS regulation by RBM7, we designed antisense oligonucleotides (ASOs) to block RBM7 potentially binding sites (UUUCUU). As shown in revised Figure 4F, when compared to scramble ASO, targeting ASOs contributed to the exclusion of exon7. Additionally, we constructed an exogenous MFGE8 splicing reporter containing exon 6-8 and partial intron sequences to determine the binding site for AS regulation by RBM7. The depletion of RBM7 still induced the splicing shift of the minigene reporter by elevating MFGE8-S variant. While the binding motif UUUCUU was removed or mutated, RBM7 failed to affect the splicing outcomes of MFGE8 (new Figure S3C, also see below). Due to its close proximity to 3’ splice site, UUUCUU residues bound by RBM7 is very likely to participate in spliceosome assembly at the upstream 3’ splice site of exon7, which may explain why disruption of the motif led to almost complete exon7 skipping. The above data suggested that RBM7 regulated the exon skipping of MFGE8 by binding to UUUCUU located six nucleotides upstream of the 3’ splice-site of exon7.

      Author response image 10.

      Upper: the red line in diagram indicates ASOs targeting region which contains UUUCUU; down: MCF7 and MDA-MB-231 cells were transfected with ASOs targeting MFGE8 pre-mRNA for 48h and then applied for RT-PCR identification. P values were determined by one-way ANOVA with Tukey's multiple comparison test.

      Author response image 11.

      Upper: MFGE8 min-splicing reporters with mutation in the RBM7 binding site or a non-specific binding were generated and shown in cartoon; down: RT-PCR assays were performed to identify the splicing outcomes of MFGE8 reporter while RBM7 was depleted in breast cancer cells.

      (5) Some typos, graphic errors, and sentences are hard to understand and need to be corrected, such as lines 80-81, 249-250, line 221 "motfs", line 319 "RBM4". Please carefully proofread and revise the entire manuscript.

      As suggested by the reviewer, we have corrected typos and graphic errors mentioned above. In addition, this manuscript was also extensively edited to improve grammar and sentence structure.

      (6) Define the abbreviations when they first appear, such as MFGE8-L, RBM, etc.

      We thank the reviewer for raising this point. We have defined the abbreviations when firstly presented in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors reported the biological role of RBM7 deficiency in promoting metastasis of breast cancer. They further used a combination of genomic and molecular biology approaches to discover a novel role of RBM7 in controlling alternative splicing of many genes in cell migration and invasion, which is responsible for the RBM7 activity in suppressing metastasis. They conducted an in-depth mechanistic study on one of the main targets of RBM7, MFGE8, and established a regulatory pathway between RBM7, MFGE8-L/MFGE8-S splicing switch, and NF-κB signaling cascade. This link between RBM7 and cancer pathology was further supported by analysis of clinical data.

      Strengths:

      Overall, this is a very comprehensive study with lots of data, and the evidence is consistent and convincing. Their main conclusion was supported by many lines of evidence, and the results in animal models are pretty impressive.

      Weaknesses:

      However, there are some controls missing, and the data presentation needs to be improved. The writing of the manuscript needs some grammatical improvements because some of the wording might be confusing.

      We thank the reviewer for the positive comments on this work, and have addressed all the concerns raised by the reviewer.

      Specific comments:

      (1) Figure 2. The figure legend is missing for Figure 2C, which caused many mislabels in the rest of the panels. The labels in the main text are correct, but the authors should check the figure legend more carefully. Also in Figure 2C, it is not clear why the authors choose to examine the expression of this subset of genes. The authors only refer to them as "a series of metastasis-related genes", but it is not clear what criteria they used to select these genes for expression analysis.

      We thank the reviewer for raising this question. We have included the figure legend for Figure 2C and improved other figure legends throughout the article. For the second question, since gene ontology analysis of RNA-seq data in RBM7-depleted breast cancer cells showed that a series of differentially expressed genes were enriched in metastasis-associated processe, we identified the expression of this subset of genes in breast cancer cells in the presence or absence of RBM7 by heatmap differential analysis based on qRT-PCR results. To clarify this point and address the reviewer’s concern, we have improved the relevant description of this part (see line 174-180 in the new manuscript).

      (2) Line 218-220. The comparison of PSI changes in different types of AS events is misleading. Because these AS events are regulated in different mechanisms, they cannot draw the conclusion that "the presence of RBM7 may promote the usage of alternative splice sites". For example, the regulators of SE and IR may even be opposite, and thus they should discuss this in different contexts. If they want to conclude this point, they should specifically discuss the SE and A5SS rather than draw an overall conclusion.

      We are thankful for the reviewer’s valuable comment. According to the suggestion, we have removed the overall conclusion and corrected to discuss in SE and A5SS.

      (3) In the section starting at line 243, they first referred to the gene and isoforms as "EFG-E8" or "EFG-E8-L", but later used "EFGE8" and "EFGE8-L". Please be consistent here. In addition, it will be more informative if the authors add a diagram of the difference between two EFGE8 isoforms in terms of protein structure or domain configuration.

      As suggested by the reviewer, we keep using the name “MFGE8-L” for the canonical MFGE8 isoform and “MFGE8-S” for the truncated isoform in this manuscript. In addition, to clarify the structural basis for the different tumor invasion-related functions of two MFGE8 isoforms, we have included a diagram of their domain configuration in new Figure S4F and predicted protein structure in new Figure S4G. The details in the revised manuscript are given below:

      Author response image 12.

      Schematic diagram of the domain composition of two MFGE8 isoforms. Upper: the full-length variant with exon7 indicated by yellow square; down: the truncated variant with exon7 skipping.

      Author response image 13.

      The model structure of two MFGE8 isoforms was implemented using SwissModel software. The F5/8 type C2 protein domain excluded from MFGE8-S variant was marked in red.

      (4) Figure 7B and 7C. The figures need quantification of the inclusion of MFGE exon7 (PSI value) in addition to the RT-PCR gel. The difference seems to be small for some patients.

      As suggested by the reviewer, we have included the relative quantification of PSI for endogenous MFGE8 in breast cancer patients and found increased proportion of exon7 exclusion in most tumor samples when compared to normal tissues (case#1: 86:94; case#2: 84:86; case#3: 79:85; case#4: 63:75; case#5: 69:93; case#6: 71:80) (new Figure 7B, also see below). On the other hand, we have expanded the number of metastatic breast cancer cases and quantified the the AS events within MFGE8 by analyzing the PSI values. The lymph node metastases contain a higher proportion of MFGE8 variant with skipped exon7 in comparison with paired primary tumor tissues (case#1: 80:95; case#2: 86:97; case#3: 84:90; case#4: 70:78; case#5: 83:89) (Figure 7C). This is coherent with decreased RBM7 expression levels found in breast cancer with lymph node metastasis.

      Author response image 14.

      The splicing alteration of MFGE8 in 6 pairs of primary breast cancer tissues and adjacent normal tissues was examined using RT-PCR. The quantification of PSI vales was based on relative band intensities using Image J software.

      Author response image 15.

      The splicing alteration of MFGE8 in primary breast cancer tissues and corresponding lymph node metastases was identified by RT-PCR assays. The quantification of PSI vales wa determined by Image J software.

      Minor comments:

      The writing in many places is a little odd or somewhat confusing, I am listing some examples, but the authors need to polish the whole manuscript more to improve the writing. 1. Line 169-170, "...followed by profiling high-throughput transcriptome by RNA sequencing", should be "followed by high-throughput transcriptome profiling with RNA sequencing". 2. Line 170, "displayed a wide of RBM7-regulated genes were enriched...", they should add a "that" after the "displayed" as the sentence is very long. 3. Line 213, "PSI (percent splicing inclusion)" is not correct, PSI stands for "percent spliced in". 4. Line 216-217, the sentence is long and fragmented, they should break it into two sentences. 5. Line 224, the "tethering" should be changed to "recognizing". There is a subtle difference in the mechanistic implication between these two words. 6. Line 250, should be changed to "...in the ratio of two MFGE8 isoforms".

      We thank the detailed comments from the reviewer. The points mentioned above has been addressed one by one and this manuscript was also extensively edited to improve grammar and sentence structure for better understanding.

      References

      Axel Weber PW, Michael Kracht* (2010) Interleukin-1 (IL-1) Pathway. SCIENCESIGNALING.

      Qing Guo1, Yizi Jin1,2, Xinyu Chen3, Xiaomin Ye4, Xin Shen5, Mingxi Lin1,2, Cheng Zeng1,2, Teng Zhou1,2 and Jian Zhang1,2 (2024) NF-κB in biology and targeted therapy: new insights and translational implications. Signal Transduction and Targeted Therapy.

      Falk S, Finogenova K, Melko M, Benda C, Lykke-Andersen S, Jensen TH, Conti E (2016) Structure of the RBM7–ZCCHC8 core of the NEXT complex reveals connections to splicing factors. Nature Communications.

    1. Author response:

      eLife assessment

      This useful study shows how genetic variation is associated with fecundity following a period of reproductive diapause in female Drosophila. The work identifies the olfactory system as central to successful diapause with associated changes in longevity and fecundity. While the genetic screening and methods used are solid, the approach to assessing diapause is incomplete and could benefit from additional orthogonal experiments.

      Response: We agree that, as with most studies, additional follow-up work will be informative.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects.

      Strengths and Weaknesses:

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. A minor point is I cannot find how many DGRP lines are used.

      Response: Thank you for the suggestions. We screened 193 lines and we will add that information to the methods.

      Additionally, we will add the heritability estimate of the post-diapause fecundity trait.

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Weaknesses:

      (1) I suspect that there may be variation in survivorship after long-term exposure to cold conditions (10ºC, 35 days), which could also be quantified and mapped using genome-wide association studies (GWAS). Since blocking Ir21a neuronal transmission prevented flies from exiting diapause, it is possible that natural genetic variation could have a similar effect, influencing the success rate of exiting diapause and post-diapause mortality. If there is variation in this trait, could it affect post-diapause fecundity? I am concerned that this could be a confounding factor in the analysis of post-diapause fecundity. However, I also believe that understanding phenotypic variation in this trait itself could be significant in regulating adult diapause.

      Response: We agree that it is possible that the ability to endure cool temperatures per se may influence post-diapause fecundity. However, cool temperature is the essential diapause-inducing condition in Drosophila, so it is not obvious how to separate those effects experimentally, and we agree that phenotypic variation in the cool-sensitivity trait itself could be significant in regulating diapause.

      (2) On p.10, the authors conclude that "Dip-𝛾 and sbb are required in neurons for successful diapause, consistent with the enrichment of this gene class in the diapause GWAS." While I acknowledge that the results support their neuronal functions, I remain unconvinced that these genes are required for "successful diapause". According to the RNAi scheme (Figure 4I), Dip-γ and sbb are downregulated only during the post-diapause period, but still show a significant effect, comparable to that seen in the nSyb Gal4 RNAi lines (Figure 4K).

      Response: Our definition of successful diapause is the ability to produce viable adult progeny post-diapause, which requires that the flies enter, maintain, and exit diapause, alive and fertile. We will restate our conclusion to say that Dip-γ and sbb are required for post-diapause fecundity.

      In addition, two other RNAi lines (SH330386, 80461) that did not show lethality did not affect post-diapause fecundity.

      Response: We interpret those results to mean that those RNAi lines were not effective since Dip-γ and sbb are known to be essential.

      Notably, RNAi (27049, KK104056) substantially reduced non-diapause fecundity, suggesting impairment of these genes affects fecundity in general regardless of diapause experience. Therefore, the reduced post-diapause fecundity observed may be a result of this broader effect on fecundity, particularly in a more "sensitized" state during the post-diapause period, rather than a direct regulation of adult diapause by these genes.

      Response: Ubiquitous expression of RNAi lines #27049 or #KK104056 was lethal, so we included the tubGAL80ts repressor to prevent RNAi from taking effect during development. Flies had to be shifted to 30 °C to inactivate the repressor and thereby activate the RNAi. At 30 °C, fecundity of the controls (GFP RNAi lines #9331, KK60102) were also lower (average non-diapause fecundity = 12 and 19 respectively) and similar to #27049 or #KK104056. We also assessed the knockdown using Repo GAL4 and nSyb GAL4 and did not find a significant difference/decline in the non diapause fecundity for #27049 and #KK104056 as compared to a nonspecific RNAi control (#54037).

      (3) The authors characterized 546 genetic variants and 291 genes associated with phenotypic variation across DGRP lines but did not prioritize them by significance. They did prioritize candidate genes with multiple associated variants (p.9 "Genes with multiple SNPs are good candidates for influencing diapause traits."), but this is not a valid argument, likely due to a misunderstanding of LD among variants in the same gene. A gene with one highly significantly associated variant may be more likely to be the causal gene in a QTL than a gene with many weakly associated variants in LD. I recommend taking significance into account in the analysis.

      We agree with the reviewer, and in Supplemental Table S3 we list top-associated SNPs in order from the lowest (most significant) p-value. Most of the top-associated genes from this analysis were uncharacterized CG numbers for which there were insufficient tools available for validation purposes. Nevertheless, there is overlap amongst the highly significant genes by p-value and those with multiple SNPs. Amongst the top 15 genes with multiple associated SNPsCG18636 & CR15280 ranked 3rd by p-value, CG7759 ranked 4th, CG42732 ranked 10th, and Drip ranked 30th (all above the conservative Bonferroni threshold of 4.8e-8) while three Sbb-associated SNPs also appear in Table 3 above the standard e-5 threshold.

      Reviewer #3 (Public Review):

      Summary:

      Drosophila melanogaster of North America overwinters in a state of reproductive diapause. The authors aimed to measure 'successful' D. melanogaster reproductive diapause and reveal loci that impact this quantitative trait. In practice, the authors quantified the number of eggs produced by a female after she exited 35 days of diapause. The authors claim that genes involved with olfaction in part contribute to some of the variation in this trait.

      Strengths:

      The work used the power platform of the fly DRGP/GWAS. The work tried to verify some of the candidate loci with targeted gene manipulations.

      Weaknesses:

      Some context is needed. Previous work from 2001 established that D. melanogaster reproductive diapause in the laboratory suspends adult aging but reduces post-diapause fecundity. The work from 2001 showed the extent fecundity is reduced is proportional to diapause duration. As well, the 2001 data showed short diapause periods used in the current submission reduce fecundity only in the first days following diapause termination; after this time fecundity is greater in the post-diapause females than in the non-diapause controls.

      Response: The 2001 paper by Tatar et al. reports the number of eggs laid after 3, 6, or 9 weeks in diapause conditions. Thus the diapause conditions used in this study (35 days or 5 weeks) are neither short nor long, rather intermediate. Does the reviewer have a specific concern?

      In this context, the submission fails to offer a meaningful concept for what constitutes 'successful diapause'. There is no biological rationale or relationship to the known patterns of post-diapause fecundity. The phenotype is biologically ambiguous.

      Response: We have unambiguously defined successful diapause as the ability to produce viable adult progeny post-diapause. Other groups have measured % of flies that arrest ovarian development or % of post-diapause flies with mature eggs in the ovary, or # eggs laid post-diapause; however we suggest that # of viable adult progeny produced post-diapause is more meaningful than the other measurements from the point of view of perpetuating the species.

      I have a serious concern about the antenna-removal design. These flies were placed on cool/short days two weeks after surgery. Adults at this time will not enter diapause, which must be induced soon after eclosion. Two-week-old adults will respond to cool temperatures by 'slowing down', but they will continue to age on a time scale of day-degrees. This is why the control group shows age-dependent mortality, which would not be seen in truly diapaused adults. Loss of antennae increases the age-dependent mortality of these cold adults, but this result does not reflect an impact on diapause.

      Response: The reviewer has a point. We carried out the lifespan study under two different conditions: either by removing the antenna and moving the flies directly to 10 °C or by removing the antenna and allowing a “wound healing” period prior to moving the flies to 10 °C (out of concern that the flies might have died quickly because wound healing may be impaired at 10 °C). In both cases, lifespan was shortened. We will add a discussion of the technical limitations of this experiment.

      • Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The work falls well short of its aim because the concept of 'successful diapause' is not biologically established. The paper studies post-diapause fecundity, and we don't know what that means. The loci identified in this analysis segregate for a minimally constructed phenotype. The results and conclusions are orthogonal.

      Response: It is unclear to us why the reviewer has such a negative opinion of measuring post-diapause fecundity, specifically the ability to produce viable progeny post-diapause. The value of this measurement seems obvious from the point of view of perpetuating the species.

      • The likely impact of the work on the field, and the utility of the methods and data to the community.

      The work will have little likely impact. Its phenotype and operational methods are weakly developed. It lacks insight based on the primary literature on post-diapause. The community of insect diapause investigators are not likely to use the data or conclusions to understand beneficial or pest insects, or the impact of a changing climate on how they over-winter.

      Response: The reviewer has not explained why his/her opinion is so negative.

    1. Author response:

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.” The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) Data labeling and additional supporting data

      Major points (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Author response image 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Author response image 3). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Author response image 3.

      Nrn1 antibody blockade in WT iTreg cell culture caused similar phenotypic change as in Nrn1-/- iTreg cells. Nrn1-/- and WT CD4 cells were differentiated under iTreg condition in the presence of anti-Nrn1 (aNrn1) antibody or isotype control for 3 days. Cells were restimulated with anti-CD3 and in the presence of aNrn1 or isotype. a. MP measured 18hr after anti-CD3 restimulation. b. live CD4 cell number and proportion of Ki67 expression among live cells three days after restimulation. c. The proportion of Foxp3+ cells among live cells three days after restimulation.  

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. :1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We previously responded to reviewer comments in a previous iteration of this draft, edited the manuscript accordingly, and have no further comments on the majority of them. However, we performed additional analyses mainly in response to weaknesses Reviewer 1 highlighted related to “one shortcoming [being] the lack of a conceptual model explaining the results”, and the eLife assessment stating “the study falls short of providing a cogent interpretation of key findings, which could be of great interest and utility”. We provide a conceptual explanation that ties together many of our results, which we demonstrate using real data and further explore using simulated data – these analyses are in a new section titled “Increase in PGS effect for increasing percentiles of BMI itself, and its relation to R2 differences when stratifying by covariates”, with the Discussion also being updated accordingly.

      Essentially, we demonstrate that the effect of PGSBMI increases as BMI itself increases (using quantile regression – newly created Figure 5). This finding helps explain the correlation between covariate main effects, interaction effects, and maximum R2 differences when stratifying on different covariates, and also why any one or combination of covariates did not seem to be of unusual interest. While this result readily explains why covariates with larger main effects have larger interaction effects, by itself it does not seem to explain the differences in R2 in covariate-stratified bins, but we show using portions of real data and simulated data that in the case of this study they are closely related.

      Effectively, as the effect of PGSBMI increases, variance in the phenotype will also increase – so long as the residuals do not increase proportionately, this causes R2 to also increase as R2 directly depends on outcome variance. We demonstrate this using simulated data (newly created S Figure 2) and real data (newly created S Figure 3). So the largest R2 differences between certain covariate-stratified bins seems to be a direct consequence of those covariates also having the largest PGSBMI*covariate interaction effects. These results tie into our previous response to Reviewer 1, where essentially there is not only heteroskedasticity in the relationship between PGSBMI and BMI, but a cause of the heteroskedasticity is an increasing effect in PGSBMI as BMI itself increases.

      In the Discussion, we highlight several broad implications of these findings. First, these results may, in part, provide a generalizable explanation for epistasis, as the effect of a PGS (or any individual SNP) seems to depend on phenotype, and as phenotype depends on many SNPs, the effect of PGS and individual SNPs depends on other SNPs. Second, these results may also provide a generalizable explanation for GxE, as, demonstrated in this paper, interaction effects for SNPs (or a PGS) may largely depend on the phenotypic value itself, rather than any specific environment(s) or combination of. Finally, related to our previous response to Reviewer 2, modeling effects of SNPs dependent on phenotype itself would almost certainly result in gains in PGS performance (and locus discovery), which should also be larger than e.g., just GxAge effects as we demonstrated in this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful reading of our manuscript and their constructive comments. We have significantly improved the writing, consolidated figures, and include new experiments (see below). We now center the manuscript on the methods used and have updated the title to reflect this new emphasis. We have also added quantification with statistics, as described below. A detailed description of our improvements is provided below.

      New data figures:   

      • Fig 3 – fig supp 2 – new experiment with insulin-triggered endocytosis of InsR

      • Fig 3 – fig supp 3 – new experiments, all using the same protein construct

      • Fig 3 – movie–  new experiment with insulin-triggered endocytosis of InsR

      • Fig 4 – added new vehicle-only negative control experiments

      • Fig 5 – fig supp 1 – new negative control experiments with sequential exposures to 750 nm light

      Added figure panels with quantification/statistics for:  Fig. 1F; , Figure 1- figure supp 2B, Figure 2B, D, Fig. 2 – fig supp 1B, D; Fig 2 – fig supp 2B;  Fig 2 – fig supp 3B;  

      Reviewer #1:

      (1) The paper might benefit from a more streamlined structure and a clearer emphasis on its findings. A possible way to enhance its impact might be to focus more on its methodological aspects. The methodological facets stand out as both innovative and impactful.

      We thank the reviewer for this suggestion and have rewritten the manuscript to center the methods, with our applications to TRPV1 and the InsR serving as examples.

      (2) Line 243: Please provide a reference for Tet3-Bu or clarify its origin in this study. A concise description would be helpful.

      The Jang et al., 2020 and Jana et al., 2023 studies are cited and give the structure of Tet3-Bu in Figure 3A.

      (3) Consider merging Figures 1 and 2 for clarity.  

      Because the cell types and constructs expressed differ for the figures, we did not merge them. However, we moved Figure 1 to the supplement because it repeats previously published data.

      (4) Lines 281 and 293 should refer to Figure 5C, not 5B.  

      This is now corrected.

      (5) Should the paper pivot towards methodology, combining Figures 6 and 7 might be more coherent. 

      The experiments in Figures 6 and 7 are different, making it difficult to merge them. However, Figures 7 and 8 describe the same experimental approach applied to two different membrane proteins. To align with our new focus on the methods and deemphasis of the biological system, we have merged Figures 7 and 8.

      (6) A brief discussion comparing the cell surface labeling techniques and the merits of the presented system would offer valuable context.

      We agree that additional discussion here would be helpful but were also trying to satisfy Reviewer #3’s request to reduce review-like content that disrupts the flow of the primary results. We therefore did not add a discussion of cell-surface labeling techniques.

      Reviewer #2:

      (1) To monitor the phosphatidylinositol-3,4,5-trisphosphates, the pleckstrin homology (PH) domain from Akt was used. This PH domain is not specific for just PI(3,4,5)P3 as stated by the authors. The Akt PH domain also binds PI(3,4)P2. The observed PI3K localization increase will also increase PI(3,4)P2 concentrations so the observed responses may not be solely because of PI(3,4,5)P3…

      …Repeating the PH domain experiments with a PH domain that is specific for just PI(3,4,5)P3, like GRP1 or Btk, would be useful to separate out any contributions from PI(3,4)P2.

      We have repeated key experiments demonstrating optogenetic activation of PI3K with the Grp1-PH domain and included these data in Figure 1-figure supplement 2.

      (2) The data in Figure 4 supplement was confusing to interpret since it is unclear whether a membrane protein with the Tet3 is being expressed at the same time as the ncAA for labeling or if the observed labeling is endogenous. If the observed labeling in Figure 4 supplement D is endogenous, then significant concerns come up regarding the background labeling of the sTCO-sulfo-Cy5 used in the rest of the experiments.

      We have updated the data in this figure using the same protein (InsR-Tet3-Bu-GFP) for every sTCO-conjugated dye tested. The protein is also labelled with GFP, making it clear which cells in the field were transfected and which were not. The new panels showing the bright field images for each field further aid readers in identifying untransfected cells. We believe the new presentation addresses the reviewer’s concerns about distinguishing sTCO labeling of Tet3-Bu-incorporating protein from labeling of endogenous proteins.

      (3) I recommend reorganizing the article to be more linear. For example, Figure 4 is not fully explained until after Figure 4 supplement and Figure 5. This non-linear organization required a lot of back and forth reading to fully understand the logic of the experiments as well as the conclusions. 

      We have improved the presentation along the lines suggested by the reviewer.

      (4) The InsR data is interesting as a proof of concept however the writing around the InsR looks like an afterthought. The explanation for why InsR is chosen, what is known and unknown about its trafficking is given secondary importance in the writing but not in the figures. This difference weakens the article.  

      We have improved the presentation along the lines suggested by the reviewer.

      (5) Line 244 should read Figure 4A.  

      This is now corrected.

      (6) Line 281 should read Figure 5C.  

      This is now corrected.

      (7) Line 645. Fig 4, says C and E were shown as inverted b&w images when they aren't.  

      This is now corrected.

      (8) Fig 8. Line 702. States that these are TRPV1 positive cells but the figure is about InsR.

      This is now corrected.

      Reviewer #3:

      (1) The Results section is lengthy and disorganized. Consider revising it for better clarity and conciseness. For instance, moving lines 157 and 166-170 to the Discussion or Methods section can streamline the Results section.  

      We have improved the presentation along the lines suggested by the reviewer.

      (2) Provide more specificity in reporting: In lines 139-170, clarify why you chose to use PhyB and this particular technique. Eliminate extraneous details and maintain a more concise narrative.

      We have improved the presentation along the lines suggested by the reviewer.

      (3) Avoid excessive review-like content, and keep the Results section focused on presenting novel findings. Simplify lines 4 173-185 to provide a straightforward presentation of results rather than extensive references to previous work.

      We have improved the presentation along the lines suggested by the reviewer.

      (4) Reevaluate lines 196-204 to determine if they are best suited for the Results section or if they could be moved to the Discussion or Methods for improved focus.

      We have improved the presentation along the lines suggested by the reviewer.

      (5) 231-238, revise the content to be more concise and directly to the point.  

      We have improved the presentation along the lines suggested by the reviewer.

      (6) Limit the number of figures to a maximum of five and restructure them to enhance readability. Consider consolidating panels from Figures 1 (which replicates previouslypublished work), 2, and 3 into a single figure to improve organization and information flow. 

      See response to Reviewer #1, Comment #3. Although we did not merge Figures 2 and 3, we have consolidated the writing to improve the flow of the writing.

      (7) Move Fig 5, which depicts control experiments, to supplementary information to improve the overall flow of the paper. Also, Figure 5 comes in the text before Figure 4 C-F and before Figure 4- supp1, so placing it in supplementary information would fix this issue. 

      We have moved this figure to the supplement as Figure 3 – figure supplement 1.

      (8) Merge Figures 6, 7, and 8 (or at least 7 and 8) to facilitate the comparison of data obtained with different proteins or conditions.  

      We have merged Figures 7 and 8.

      (9) Line 303: when referring to the chemical structure of sTCO-sulfo-Cy5, refer to Figure 4 Supp 1 and not Figure 9. Alternatively, consider moving Fig 9 to supplementary information or placing it earlier in the figure list.  

      We now refer to the earlier supplemental figure when describing the structure of sTCO-sulfo-Cy5.

      (10) Ensure proper referencing of Figure 4E in the text, particularly since it's vital to understanding the selection of mutation sites for the Insulin receptor, as discussed in lines 392-400. 

      We have made this correction.

      (11) Maintain citation consistency by verifying that all references cited in the text, including those in the Introduction, Results, and Discussion sections, are included in the References list at the end of the paper.

      We have reviewed all our citations for consistency.

      The reviewer is also concerned by the lack of any statistical analyses, and of appropriate control experiments:

      (1) The trapping of PI3K at the plasma membrane, shown in Figure 3 supplementary 1, is not very convincing. It is unclear whether PI3K is trapped at the membrane, as claimed by the authors, or whether PI3K slowly accumulates at the membrane independently of the light stimulation. Indeed, the baseline fluorescence isn't flat to start with (especially in F-11 cells), and the change in fluorescence under 650 nm light is very modest, much weaker, in fact, than in control experiments without TRPV1 (Figure 2C). Do the authors observe a similar drift in fluorescence in absence of photostimulation at 650 nm? Such control experiment needs to be performed and discussed. More importantly, authors need to provide quantitative (and not just qualitative) measures of the changes in fluorescence observed in the different conditions, and run adequate statistical analyses to compare the different conditions (for all the figures of the manuscript where this applies).  

      We can see that the language of “trapped at the membrane” is more of an interpretation than a description. We now describe this result as a lack of dissociation of PIF-iSH2 from the membrane in response to 750 nm light. We more clearly explain our interpretation and label it as speculative.

      (2) Consider moving Figure 3 Supplementary 1 from supplementary information to the main document due to its importance. It seems like an important finding to me, and I believe also to the authors, who wrote a whole paragraph on PI3K trapping in the discussion section (lines 361-380).  

      We agree that the results from this figure are important. To better align with the request of all reviewers to shorten the manuscript and reduce the number of figures in the main text, however, we have left the figure in the supplement.

      (3) Figure 3: why is the increase in IP3 levels not reversible as in Figure 2? Is this because IP3 is detected only at the membrane level (TIRF experiment) and not the entire cell? Authors should comment on this aspect. 

      As described in response to Comment#2, we now better explain our interpretation. Briefly, we speculate that the PIF-iSH2 that encounters TRPV1 in the plasma membrane binds to the ankyrin repeat domain of TRPV1 and, therefore, does not readily dissociate from membrane in response to 750 nm light.

      (4) Figure 4E: Verify the functionality of the Insulin receptor mutants, as was done for TRPV1.  

      We have added new experiments to demonstrate that the insulin receptor incorporating Tet3-Bu is functional. Because the insulin receptor is not electrogenic, we could not use electrophysiology to validate its function. Instead, we measured the insulin-dependent endocytosis of the receptor. These data are now presented in Figure 3 – figure supplement 2 and Figure 3 –  supplemental movie.

      (5) Figures 6 to 8: The authors quantify the change in plasma membrane expression of TRPV1 and insulin receptors after NGF treatment (or photoactivation), but an important control experiment is missing. They first label cells with sulfo-Cy5, then treat them with NGF (or photoactivate them with 650 nm light), and then label them again with sulfo-Cy5, supposedly to label only the TRPV1 receptors that newly arrived at the membrane. However, we have no evidence that the first sulfo-Cy5 labeling (1 uM, 5 min) was complete. In fact, labeling with sulfo-Cy5 (200 nM) in Figure 4 never reaches saturation, not even after 20 min. The authors need to control for this, by comparing the change in fluorescence with and without NGF treatment. The GFP control is simply not sufficient. Also, include Figure 8 in the text, as it is missing from the results section, and discuss the results in more detail. Indeed, the current data is appealing as it suggests that what was observed with TRPV1 is also true for the Insulin receptor, but without a proper control this could just be an artefact.  

      We have performed several new control experiments to address the reviewer’s concerns. (1) For NGF-induced increase in TRPV1 at the plasma membrane, we repeated the experiment using a vehicle instead of NGF. These data, added to Figure 4E, demonstrate that the increase in plasma membrane TRPV1 depends on NGF. (2) For the light-activated increase in plasma membrane TRPV1, we repeated the experiment using a second exposure to the deactivating 750 nm light instead of the activating 650 nm light and added the data as Figure 5, figure supplement 1A-E. These new data demonstrate that the increase in plasma membrane TRPV1 occurred only in response to  the activating wavelength of light. (3) To address the same as the previous comment, but for the insulin receptor, we repeated the insulin receptor experiments also using a second exposure to the deactivating wavelength of light. These data are now shown in Figure 5, figure supplement 1F-I and demonstrate that the increase in the insulin receptor levels in the plasma membrane required the activating wavelength of light.

      (6) Line 313: "Importantly, sTCO-sulfo-Cy5 did not appear to equilibrate across the cell membrane and did not label untransfected cells (i.e., those without GFP; Figure 4 - figure supplement 1)". I don't see where the absence of labeling of untransfected cells is shown. The authors should show fluorescence changes on the surface of both transfected and untransfected cells and, as discussed above, quantify the data and provide statistical analyses.

      See response to Reviewer #2, Comment #2.

      Minor Comments:

      (1) Define « PM » and « RTK » in abstract  We have made the requested changes.

      (2) Consider presenting the signaling pathways defined in the introduction in a scheme to improve readability.  

      We have added the signaling pathways defined in the introduction to Figure 1A.

      (3) In Figure 1A, include the CAAX lipidation signal in the schematic representation.  

      We had already shown the lipidation itself, but we have added the lipidation signal as a magenta star, with its meaning explained in the figure legend. We hope the reviewer finds this useful.

      (4) Terminology clarification: Given the broad readership of Elife, provide clearer explanations for terms and techniques used, such as the function of PIF (line 144).  

      We define the acronym PIF in the text, but do not further elaborate on the biological function of PIF to align with other reviewers’ requests that we reduce the review-type material in the manuscript.

      (5) Correct "m-1s-1" to "M-1s-1" in line 119.  

      This is now corrected.

      (6) Replace "activate" with "activation" in line 122.  

      This is now corrected.

      (7) Indicate 650 nm and 750 nm next to the arrows in Figure 2B for reader clarity.  

      We have added the requested arrow labels.

      (8) Correct Figure 5A to Figure 4A in line 244.  

      This is now corrected.

      (9) Correct Figure 5B to Figure 5C in line 293.  

      This is now corrected.

      (10) In lines 274, 293, 312 and 329, clearly specify which panels of the referenced figures are being discussed to avoid confusion. 

      We have now clearly specified which panels are being referenced.

      (11) Figure 1B: it is unclear how long after 650 nm light switching the image is taken. The red bar indicating 650 nm light makes it look like the image is taken right after light switching, which would suggest that PIF-YFP trafficking to the membrane takes milliseconds in response to 650 nm light. However, the legend says that photoactivation kinetics are in the range of 10 seconds. Please accurately position the red bar in Figure 1B to reflect the time between light switching and imaging, and specify the time between light switching and imaging in the figure legend.  

      We have more accurately shown the timing of image acquisition in what is now Figure 1, figure supplement 1.

      (12) Please add a merged image for all the immune data figure.

      We are uncertain about which figures the reviewer is referring to. We do not have any immunohistochemistry in the manuscript.  

      (13) Line 205: "we found that expression of TRPV1 trapped PIF-iSH2 at the PM upon stimulation with 650 nm light, so that it no longer translocated to the cytoplasm in response to 750 nm light (Figure 3B and Figure 3 - figure supplement 1A)." This is shown in the supplementary figure but not in Figure 3B. Same issue with the following sentence.  

      We have corrected the figure references in the text.

      (14) For Figures 7 and 8, the authors state ""We next asked whether click chemistry labeling could be executed in cells in which we also used the PhyB/PIF machinery for activating PI3K." Is this really the main motivation for conducting these experiments?

      Good point. We have improved the writing around this issue.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies differential Orsay virus infection of C. elegans when animals are fed on different bacteria. The evidence for this is however, incomplete, as experiments to control for feeding rate and bacterial pathogenicity are needed as well as direct quantification of viral load. 

      We appreciate that the editors and reviewers felt that our manuscript addressed an important problem. We appreciate the constructive critiques provided by the reviewers and have worked to address all of the concerns, including a number of additional experiments as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This manuscript explores the importance of food type on virus infection dynamics using a nematode virus as a model system. The authors demonstrate that susceptibility to viral infection can change by several orders of magnitude based on the type of bacterial food that potential hosts consume. They go on to show that, for the bacterial food source that reduces susceptibility, the effect is modulated by quorum sensing molecules that the bacteria produce. 

      Strengths: 

      This manuscript shows convincingly that nematode susceptibility to viral infection changes by several orders of magnitude (i.e. doses must be increased by several orders of magnitude to infect the same fraction of the population) depending on the bacterial food source on which hosts are reared. The authors then focus on the bacteria that reduce host susceptibility to viral infection and demonstrate that certain bacterial quorum-sensing compounds are required to see this effect of reduced susceptibility. Overall, sample sizes are large, methods are generally rigorous, experiments are repeated, and patterns are clear. 

      Weaknesses: 

      Although the molecular correlate of reduced susceptibility is identified (i.e. quorum sensing compounds) the mechanisms underlying this effect are missing. For example, there are changes in susceptibility due to altered nutrition, host condition, the microbiome, feeding rate, mortality of infected hosts, etc. In addition, the authors focus almost entirely on the reduction in susceptibility even though I personally find the increased susceptibility generated when reared on Ochrobactrum to be much more exciting. 

      I was a bit surprised that there was no data on basic factors that could have led to reductions in susceptibility. In particular, data on feeding rates and mortality rates seem really important. I would expect that feeding rates are reduced in the presence of Pseudomonas. Reduced feeding rates would translate to lower consumed doses, and so even though the same concentration of virus is on a plate, it doesn't mean that the same quantity of virus is consumed. Likewise, if Pseudomonas is causing mortality of virus-infected hosts, it could give the impression of lower infection rates. Perhaps mortality rates are too small in the experimental setup to explain this pattern, but that isn't clear in the current version of the manuscript. Is mortality greatly impacted by knocking out quorum-sensing genes? Also, the authors explored susceptibility to infection, but completely ignored variation in virus shedding. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      The reviewer is correct to assert that differences in viral shedding could exist. However, our susceptibility assays using exogenous Orsay virus remove this source of variation and yet we still observe the same trends such that O. vermis MYb71 promotes infection while P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 attenuate infection. Further we measured the amount of virus shed into the lawns in the presence of different bacteria and did not observe differences in shed virus that could account for the differences we observe in incidence proportion (Line numbers 241-254, Fig. 3 F). Viral stability could be an issue in both the transmission and susceptibility assays. We therefore tested viral stability in the presence of E. coli, P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of any bacterium (Fig. 3D and 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from the three Pseudomonas strains we examined precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      I was also curious why the authors did not further explore the mechanism behind the quorumsensing effect. Not sure whether this is possible, but would it be possible to add spent media to the infection plates where the spent media was from Pseudomonas that produce the quorum sensing compound but the plates contain OP50, Pseudomonas, or the quorum sensing knockout of Pseudomonas? That would reveal whether it is the compound itself vs. something that the compound does. 

      We observed that quorum sensing mutants suppressed the attenuation of Orsay virus infection and we agree that this could be a consequence of the compounds themselves, or more likely an effect of the downstream consequences of quorum signaling. We added culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that it is not the compound itself that is responsible, however we cannot rule out that the compounds themselves may be responsible if provided at a higher concentration.

      In addition, I was surprised by how much focus there was on the attenuation of infection and how little there was on the enhancement of infection. To me, enhancement seems like the more obvious thing to find a mechanism for -- is the bacteria suppressing immunity, preventing entry to gut cells, etc? 

      We are also intrigued by the enhancement of infection by Ochrobactrum spp, however we chose to focus on attenuation given the availability of Pseudomonas aeruginosa genetic mutants for study. We have added data (Line numbers 371-402, Figure 7, and Supplemental Figure 12) that inform our current hypothesis regarding Ochrobactrum mediated enhancement of Orsay virus infection.

      I was a bit concerned about the "arbitrary units", which were used without any effort to normalize them. David Wang and Hongbing Jiang have developed a method based on tissue culture infectious dose 50 (TCID50) that can be used to measure infectious doses in a somewhat repeatable way. Without some type of normalization, it is hard to imagine how this study could be repeated. The 24-hour time period between exposure and glowing suggests very high doses, but it is still unclear precisely how high. Also, it is clear that multiple batches of virus were used in this study, but it is entirely unclear how variable these batches were. 

      We have clarified that we also measured the (TC)ID50 for every batch of virus used similar to the methods suggested by the Wang laboratory (Line numbers 107-119 and 499-506). We have added a figure showing the virus batch variability for all batches used in this study (Supp. Fig. 2). We have further clarified that the arbitrary units correspond to the actual microliters of viral filtrate used during infection and provided clear methods to replicate our viral batch production to assist with issues of reproducibility (Line numbers 107-119 and 499-506).

      The authors in several places discuss high variability or low variability in incidence as though it is a feature of the virus or a feature of the host. It isn't. For infection data (or any type of binomial data) results are highly variable in the middle (close to 50% infection) and lowly variable at the ends (close to 0% or 100% infection). This is a result that is derived from a binomial distribution and it should not be taken as evidence that the bacteria or the host affect randomness. If you were to conduct dose-response experiments, on any of your bacterial food source treatments, you would find that variability is lowest at the extremely high and extremely low doses and it is most variable in the middle when you are at doses where about 50% of hosts are infected. 

      Thank you for pointing this out, we have removed all reference to this throughout the manuscript.

      Reviewer #2 (Public Review):

      Summary and Major Findings/Strengths:

      Across diverse hosts, microbiota can influence viral infection and transmission. C. elegans is naturally infected by the Orsay virus, which infects intestinal cells and is transmitted via the fecal-oral route. Previous work has demonstrated that host immune defense pathways, such as antiviral RNAi and the intracellular pathogen response (IPR), can influence host susceptibility to virus infection. However, little is known about how bacteria modulate viral transmission and host susceptibility. 

      In this study, the authors investigate how diverse bacterial species influence Orsay virus transmission and host susceptibility in C. elegans. When C. elegans is grown in the presence of two Ochrobactrum species, the authors find that animals exhibit increased viral transmission, as measured by the increased proportion of newly infected worms (relative to growth on E. coli OP50). The presence of the two Ochrobactrum species also resulted in increased host susceptibility to the virus, which is reflected by the increased fraction of infected animals following exposure to the exogenous Orsay virus. In contrast, the presence of Pseudomonas lurida MYb11, as well as Pseudomonas PA01 or PA14, attenuates viral transmission and host susceptibility relative to E. coli OP50. For growth in the presence of P. aeruginosa PA01 and PA14, the attenuated transmission and susceptibility are suppressed by mutations in regulators of quorum sensing and the gacA two-component system. The authors also identify six virulence genes in P. aeruginosa PA14 that modulate host susceptibility to virus and viral transmission, albeit to a lesser extent. Based on the findings in P. aeruginosa, the authors further demonstrate that deletion of the gacA ortholog in P. lurida results in loss of the attenuation of viral transmission and host susceptibility. 

      Taken together, these findings provide important insights into the species-specific effects that bacteria can have on viral infection in C. elegans. The authors also describe a role for Pseudomonas quorum sensing and virulence genes in influencing viral transmission and host susceptibility. 

      Major weaknesses: 

      The manuscript has several issues that need to be addressed, such as insufficient rigor of the experiments performed and questions about the reproducibility of the data presented in some places. In addition, confounding variables complicate the interpretations that can be made from the authors' findings and weaken some of the conclusions that are stated in the manuscript. 

      (1) The authors sometimes use pals-5p::GFP expression to indicate infection, however, this is not necessarily an accurate measure of the infection rate. Specifically, in Figures 4-6, the authors should include measurements of viral RNA, either by FISH staining or qRT-PCR, to support the claims related to differences in infection rate. 

      Following the reviewers comment we have corroborated our pals-5::GFP data using FISH staining (Line numbers 291-292 and 357-359, Figure 4D & 4E, and Figure 6C).  

      (2) In several instances, the experimental setup and presentation of data lack sufficient rigor. For example, Fig 1D and Fig 2B only display data from one experimental replicate. The authors should include information from all 3 experimental replicates for more transparency. In Fig 3B, the authors should include a control that demonstrates how RNA1 levels change in the presence of E. coli OP50 for comparison with the results showing replication in the presence of PA14. In order to support the claim that "P. aeruginosa and P. lurida MYb11 do not eliminate Orsay virus infection", the authors should also measure RNA1 fold change in the presence of PA01 and P. lurida in the context of exogenous Orsay virus. Additionally, the authors should standardize the amount of bacteria added to the plate and specify how this was done in the Methods, as differing concentrations of bacteria could be the reason for species-specific effects on infection. 

      All experimental replicates are now included within the supplementary information. 

      We have also measured RNA1 fold change following infection in the presence of P. aeruginosa PA01 and P. lurida MYb11 (Line numbers Fig 3B and 3C) and found that these bacteria also do not eliminate Orsay virus replication. 

      We thank the reviewer for their comment on controlling the amount of bacteria and have clarified our methods section to more clearly explain that we seed our plates with equivalent amounts (based on volume) of overnight bacterial culture before allowing the bacteria to grow on the plates for 48 hours.  

      (3) The authors should be more careful about conclusions that are made from experiments involving PA14, which is a P. aeruginosa strain (isolated from humans), that can rapidly kill C. elegans. To eliminate confounding factors that are introduced by the pathogenicity of PA14, the authors should address how PA14 affects the health of the worms in their assays. For example, the authors should perform bead-feeding assays to demonstrate that feeding rates are unaffected when worms are grown in the presence of PA14. Because Orsay virus infection occurs through feeding, a decrease in C. elegans feeding rates can influence the outcome of viral infection. The authors should also address whether or not the presence of PA14 affects the stability of viral particles because that could be another trivial reason for the attenuation of viral infection that occurs in the presence of PA14. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      We tested viral stability in the presence of E. coli OP50 and Pseudomonas spp. and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 (Line numbers 241-249, Fig 3D and Fig 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from each Pseudomonas species precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I really liked this manuscript, I do think there are areas for improvement though. 

      Some smaller things: 

      Line 84: "can be observed spreading from a single animal" -- this isn't really great wording because the virus itself can't be observed (at least not very easily) -- even infection is hard to see. 

      The wording in line 84-85 has now been adjusted to read “can spread from a single animal”.

      Fig 1C: which groups are statistically significantly different from each other? 

      Statistics have now been added to Figure 1C. 

      Line 154: not necessary to do for this paper, but this sentence made me curious whether the effect would have been seen with mixtures of bacteria (i.e. what if 50% were OP50 and 50% were Pseudomonas?) 

      This data has now been added in Line numbers 372-378, Figure 7A, and Supp. Fig. 12A and 12B.

      Line 262-264: I don't find this interesting at all for the reasons mentioned earlier about binomial data being the most variable in the middle. 

      These lines have been removed.

      Figure 4 B: The labels for the first two tick marks on the x-axis are switched I suspect. Otherwise, the controls did not behave as expected. 

      Figure 4B has been corrected.

      Line 288, 297 and several other places: "Orsay Virus" should be "Orsay virus". 

      We have corrected these instances.

      Supplemental Figure 2: Labels in the figure legend are B and C instead of A and B. 

      These labels have been adjusted for their placement within Figure 6.

      Line 411: I suspect this was supposed to be 13,200 xg rather than 13.2 xg. 

      This error has been corrected.

      Line 416-417: This sentence is very hard to interpret. More details are needed. This is the ID50 in which host strain? Is this averaged over all batches of virus? How variable are the batches? 

      This sentence (line number 114) has been amended to clarify that all ID50 values referred to here were calculated for ZD2611 populations in the presence of E. coli OP50. Further, Supplementary Figure 2 now shows all the ID50 values measured for each batch of virus used in this manuscript resulting in an average ID50 of 3.6.

      Lines 467-469: Why exclude these instead of counting them as zeros in the analysis? How many plates fit this description -- were there lots or only a few over the course of all experiments? 

      We have chosen to exclude these plates as these samples lost spreaders at some point during the course of the assay potentially skewing the eventual number of new infections counted depending on when the infected spreader animal crawled off the plate.  We have detailed the number of plates that fit this description in lines 559-562. 

      Line 476: A critical detail that is missing here is what number of worms were counted to score infection. Please say here or in the figure legends. 

      We have added the total number of worms counted and the minimum number counted per plate for each assay in the figure legends.

      Line 546: Why was only a single representative experiment shown? I'm asking for a justification, not necessarily for you to show all the data. 

      We chose to show a single representative experiment for two reasons:  We noted variability between susceptibility assays even when using the same batch of virus such that we could not combine experiments into a single plot as we did for transmission assays. Second, while we could normalize to a control within each experiment and expect to see similar relative differences across experiments, we believe this makes it more difficult to interpret the underlying data. For example, an increase in the infection rate of 80% compared to 10% within a population has only a single interpretation while a relative increase in the infection rate by 8x within a population could have several underlying meanings (e.g. 80% vs 10%, 64%vs 8%, 24% vs 3%). We have now included all experimental replicates in the supplementary material. 

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns: 

      (1) Lines 86-87: "utilized a collection of bacteria isolated from the environment with wild C. elegans". The authors should provide more context on the source of these bacterial strains. 

      More references for the sources of these bacteria have been added to Supplementary Table 2.

      (2) The presentation of data in Fig 1 could be improved. The authors should include the text "pals-5p::GFP" on the images shown in Fig 1B. The red dashed line in Fig. 1D should intersect the dose-response curve at y = 0.5. The column heading for Fig 1E states "ID50 +/- SD (a.u.)", but should read "ID50 ratio" and should not have units. It also might be more intuitive to normalize the ID50 value for O. vermis to E. coli OP50. This way, having an ID50 ratio >1 indicates decreased transmission relative to E. coli, and ID50 ratio <1 indicates increased transmission relative to E. coli. To increase the transparency and rigor of 1E, the authors should plot the ratios from all 3 experimental replicates. The authors should also briefly explain why different viral doses were used in Fig 1D and 1F. 

      The text “pals-5p::GFP” has now been added to Figure 1B and throughout the text. The red dashed line in figure 1D has been corrected. Figure 1E has been adjusted to an actual figure as suggested and the y-axis label is “ID50 Ratio Compared to E. coli OP50”. The ID50 replicates have been plotted in Supplementary Figure 2. We have clarified that the doses used are the same. Briefly, the technical replicates of individual doses from Figure 1D and Supplementary Figure 3A and 3B were pooled and processed for FISH staining to provide each experimental replicate of Figure 1F. 

      (3) Line 110: The claim is that Ochrobactrum and P. lurida MYb11 reduce the variability of infection levels. However, another possibility is that there's simply less dynamic range in the assay because the infection levels have been compressed to 100% and 0% under these conditions. 

      This line has been removed.

      (4) There are discrepancies between what is shown in Fig 2C and what is described in the text. Lines 163-164: "P. aeruginosa PA01 and P. lurida MYb11 attenuated average infection to 33% and 62% of the population respectively". In Fig 2C, the mean for PA01 is ~25% whereas the mean for P. lurida appears to be less than 62%. 

      These values have been corrected.

      (5) Line 196: Provide more context for why rde-1 mutants were tested. This is the first time rde-1 is mentioned in the text (i.e. why show results in rde-1 mutants when the results are in Fig 2). 

      More context has been provided for why rde-1 mutants were tested (Line numbers 228-232). Briefly, using the rde-1 mutant, which has defective antiviral immunity and therefore supports higher viral replication levels than the wild-type (Félix et al. 2011), allows us to potentiate our infection assay in Figure 3B and 3C such that we maximize our chances of detecting viral replication in the presence of the Pseudomonas species, and especially P. aeruginiosa PA14, where fewer animals might be expected to get infected based upon Figure 2B and Supplementary Figure 5. 

      (6) Lines 228-229: "Mutations of any the regulators of the las, rhl, or pqs quorum sensing systems suppressed the attenuation of Orsay virus infection caused by the presence of wild-type P. aeruginosa PA01". Based on this description, PA01 should have a lower fraction of GFP positive relative to the quorum sensing mutants in Fig 4B. It seems that the x-axis labels OP50 and PA01 are swapped. 

      The x-axis labels of Figure 4B have been corrected. 

      (7) To improve clarity, for any figures that have data showing the "fraction of individuals GFP positive", the authors should include "pals-5p::GFP" in the y-axis title and legend. 

      The y-axis labels, legends, and text have been corrected throughout.  

      (8) To improve overall clarity and flow, the order in which the data is presented could be reordered. In particular, Fig. 6 could be better positioned instead of being the last figure, as no further characterization is performed on the mutants, and the findings are not conserved in strains that are more relevant to the C. elegans microbiota, such as P. lurida. The overall story could be strengthened if the authors ended the manuscript with more details related to the mechanism by which regulators of quorum sensing modulate the outcome of viral infection. 

      Figure 5 and Figure 6 have now been swapped.

      (9) Fig 5A: Make arrow sizes consistent across diagrams (i.e. the diagram for gacA deletion). 

      This figure (now Figure 6A) has been adjusted to make arrow sizes consistent across diagrams.  

      (10) Lines 280-282: "These data suggest that gacA has a conserved role across distant Pseudomonas species..." Here, the authors can provide more context on how well-conserved gacA is across Pseudomonas species (i.e. phylogenetic analysis of gacA sequences across different Pseudomonas species/strains). Furthermore, the data in Fig 5 does not provide strong enough support for the conclusion that gacA has a conserved role broadly across Pseudomonas species, as the authors only assess the effects of a gacA deletion in two species, P. aeruginosa and P. lurida. 

      We have adjusted lines 361-362 to “These data suggest that gacA has a conserved role between P. aeruginosa and P. lurida Myb11 in the attenuation of Orsay virus transmission and infection of C. elegans.” to reflect that we only assessed the effects of the gacA deletion in P. aeruginosa and P. lurida MYb11.

      (11) The manuscript can be strengthened by performing additional experiments to elucidate the mechanism by which Pseudomonas modulates viral infection. Does the attenuation of viral transmission and host susceptibility by P. lurida and P. aeruginosa require C. elegans to be in the presence of live bacteria? For example, the authors could measure viral transmission and susceptibility of C. elegans grown on heat-killed Pseudomonas. Additionally, it would be interesting to determine if modulation of viral infection is dependent on a secreted molecule. To assess this, the authors could perform viral infections in the context of Pseudomonas culture supernatant. 

      We added bacterial culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that attenuation is not mediated by a secreted molecule, however we cannot rule out that attenuation activity would become apparent if supernatant were provided at a higher concentration.

      We have found substantial challenges appropriately controlling live vs. heat-killed experiments particularly with the specifics of our susceptibility experiments. With regards to the underlying question of mechanism we believe that the genetic mutants (e.g. rhlR/gacA) are equally informative and that further comparison of these mutants’ interaction with the C. elegans host as compared to wild-type may be informative. 

      (12) The authors should include a discussion on the relative virulence potential of PA01, PA14, and P. lurida and the relationship between bacterial virulence potential and the outcome of viral infection. 

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      (13) More information is needed on strains listed in Supplementary Table 2, particularly when there is no reference listed and the strain is "Gift of XXX lab". For example, the Troemel lab previously published about an Ochrobactrum strain in Troemel et al PLOS Biology 2008 PMID: 19071962 - is this the same strain? Please ensure that there is adequate information about each strain with as many published references as possible so that the work can be more easily reproduced. 

      We have added additional information and references to the strain table in Supplementary Table 2. The strain listed as Ochrobactrum sp. has been amended to Ochrobactrum BH3 as it is the strain described in Troemel et al. 2008.

    1. Author Response

      We appreciate the thoughtful comments provided by the editor and reviewers. We were pleased to hear that they appreciated our work's contribution to the field of motor learning as well as our use of state-of-the-art analysis techniques.

      We are currently preparing a comprehensive revision of our manuscript to address several of the recommendations of the reviewers. It is our belief that this revision will not only strengthen our paper but also help clarify several areas that were highlighted by the reviewers.

      To address the concerns regarding potential confounds in our experimental design, we will be providing a more detailed justification and rationale for the experimental design and analysis choices made during our study. It appears that some reviewers’ comments may stem from misunderstandings concerning certain details of our task and we will carefully revise these sections to ensure that the design and purpose of the study are unambiguous. We will also be improving our characterizations of subjects’ learning behavior, which we believe will clarify some of the reviewers comments and enhance the overall rigor of our analyses. Lastly, we will be dealing with all concerns related to the statistical quantification of our results.

      We appreciate the opportunity to improve our manuscript for eLife and are eager to provide a revision that satisfies the majority of the reviewers’ recommendations

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      While I acknowledge the authors' effort in conducting Southern blot analysis to address my prior concern regarding the presence of dual copies of torA and tapA, I find their current resolution inadequate. Specifically, the simple deletion of the respective result sections for torA and tapA significantly impacts the overall significance of this study. The repeated unsuccessful attempts to generate correct mutants only offer circumstantial evidence, as technical issues may have been a contributing factor. Therefore, instead of merely removing these sections, it is essential for the authors to present more compelling experimental data demonstrating that torA and tapA are indeed vital for the viability of A. flavus. Such data would enhance the overall significance of this study.

      We agree and appreciate reviewer's important comments on our manuscript. In this version, we address this issue by providing additional experimental data to further support the importance of torA and tapA in the viability of A. flavus. We conducted additional experiments to generate more compelling evidence regarding the essential role of torA and tapA in the growth and development of A. flavus. We constructed a mutant strain (xylPtorA) using an xylose-inducible promoter, which allows for conditional induction with the addition of xylose (Lines 204-238, page 10).

      Due to the unsuccessful construction of TapA knockout strains and xylose promoter replacement strains, we used homologous recombination to replace the original promoter with the gpdA strong promoter for overexpression of tapA (OE::tapA). We thank reviewer for highlighting this important aspect, and we revise our manuscript accordingly to enhance its overall significance (Lines 277-297, page 13). We are grateful for the opportunity to enhance our manuscript and believe these revisions provide a more comprehensive understanding of the roles of torA and tapA in A. flavus.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Lines 421-423 and 465-466: these sentences are grammatically awkward. Please rephrase them.

      Thank you for your feedback on our manuscript. We conducted additional experiments, so we have removed the sentence from the manuscript to maintain coherence and avoid redundancy.

      Reviewer #2 (Public Review):

      In this study, authors identified TOR, HOG and CWI signaling network genes as modulators of the development, aflatoxin biosynthesis and pathogenicity of A. flavus by gene deletions combined with phenotypic observation. They also analyzed the specific regulatory process and proposed that the TOR signaling pathway interacts with other signaling pathways (MAPK, CWI, calcineurin-CrzA pathway) to regulate the responses to various environmental stresses. Notably, they found that FKBP3 is involved in sclerotia and aflatoxin biosynthesis and rapamycin resistance in A. flavus, especially that the conserved site K19 of FKBP3 plays a key role in regulating aflatoxin biosynthesis. In general, the study involved a heavy workload and the findings are potentially interesting and important for understanding or controlling the aflatoxin biosynthesis. However, the findings have not been deeply explored and the conclusions mostly are based on parallel phenotypic observations.

      Thank you for your constructive comments on our manuscript. In response to your comments, we have conducted additional experiments, including the construction of a xylose promoter mutant strain and an overexpression strain. We have also expanded the discussion section to provide a more comprehensive analysis of our findings in the context of existing literature. Thank you again for your insightful feedback, which has been instrumental in improving the quality of our work. (Lines 464-469, page 22).

      Reviewer #2 (Recommendations For The Authors):

      Point 1: Our findings revealed that both the tor and tapA genes are present in double copies in our strains, which guided our decision to construct single-copy deletion strains using homologous recombination However, the tor gene in A. flavus exhibited varying copy numbers, as was confirmed by absolute quantification PCR at the genome level (Table S1). However, it is hard to understand for Table S1: Estimation of copy number of tor gene in A. flavus toro and sumoo stand for the initial copy number, and the data are graphed as the mean {plus minus} 95%confidence limit. CN is copy number. As indicated in the Methods, Using sumo gene as reference, the tor and tapA gene copy number was calculated by standard curve. In Table S1 of WT, for tor gene, CN value is1412537 compared to 1698243 in tor+/-, for the reference gene sumo,794328 compared to1584893, how these data could support copy gene numbers of tor?

      Thank you for your insightful comments. We understand the confusion with the data presented in Table S1 regarding the copy number estimation of the torA gene in A. flavus. We apologize for not providing a clear explanation for the data in the table. Quantitative real-time PCR (qPCR) is widely used to determine the copy number of a specific gene. It involves amplifying the gene of interest and a reference gene simultaneously using specific primers and probes. By comparing the amplification curves of the gene of interest and the reference gene, we can estimate the relative copy number of the gene.

      To address your concern and provide more accurate information, we have re-performed the copy number analysis using southern blot. Southern blot analysis allows for the direct estimation of gene copy number by hybridizing genomic DNA with a specific probe for the gene. This method provides more reliable and accurate results in determining gene copy numbers. We discovered that the A. flavus genome contains a single copy of the torA gene. Consequently, we conducted additional experiments to elucidate its function. Specifically, we generated strains with a xylose-inducible promoter system to modulate the expression of torA (Lines 204-238, page 10).

      Point 2: In response: For the knockout of the FRB domain, we used the homologous recombination method, but because tor genes are double-copy genes, there are also double copies in the FRB domain. Despite our efforts, we encountered challenges in precisely determining the location of the other copy of the tor gene. I could not understand these consistent data, why not for using sequencing?

      Thank you for your valuable feedback. We determined again and confirmed that the torA gene is a single copy. So we removed this part of the results to avoid any ambiguity or potential misinterpretation.

      Point 3: Response in Due to the large number of genes involved, we did not perform a complementation experiment. If there were no complementation data, how to demonstrate data are solid?

      Thank you for your important suggestion. We understand that complementation experiments are commonly used to validate gene deletions. Therefore, to ensure the reliability of our data, we have conducted supplementary experiments on specific gene deletions, such as Δ_sitA_-C and Δ_ppg1_-C. Thank you again for your positive comments and valuable suggestions, which have significantly contributed to enhancing the quality of our manuscript (Lines 320-322, page 15).

      Point 4: Acknowledge the confusion? We acknowledge the confusion in our presentation and will ensure that accurate genetic nomenclature is used consistently

      Thank you for your comments on our manuscript. We recognize the importance of precise and consistent use of genetic nomenclature, as it is critical for the clarity and integrity of our research findings. We have carefully reviewed the sections of our manuscript where genetic terms were used and have made the necessary corrections to ensure that all nomenclature is accurate and used consistently throughout the text.

      Point 5: In the revised version of new manuscript, southern blotting was carried out and found only one copy was existed for tested genes at last. Thus, whole manuscript conclusions should be changed. In addition, Reviewer 1 suggestion for using Illumina-sequence strategy, their tor and tapA mutants could be verified whether they are aneuploid?

      We would like to express our gratitude for your insightful comments and suggestions. Following the new experimental data obtained from Southern blotting, we have identified that only one copy of the tested genes exists, and we have revised our conclusions throughout the manuscript. This has led to a significant reinterpretation of our results and a reassessment of the implications for our study. Based on this result, we designed and constructed strains with the tor gene under the control of a xylose-inducible promoter. This approach allows for the conditional expression of the tor gene. Thank you once again for your meticulous review (Lines 204-238, page 10).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study investigates parafoveal processing during natural reading, combining eye-tracking and MEG techniques, building upon the RIFT paradigm previously introduced by Pan et al. (2021). Overall, the manuscript is well-written with a clear structure, and the data analysis and experimental results are presented in a lucid manner.

      The authors have addressed the issues I raised in the previous round of review to my satisfaction. However, I still have two concerns that require the authors' consideration.

      Firstly, the similarity between the RIFT analysis process in this study and traditional ERP analysis could lead readers to equate RIFT with components like N400, potentially influencing their interpretation of the results. Although the author's response has somewhat clarified my queries, I seek confirmation: does RIFT itself signify "visual attention" or the "allocation of attentional resources to the flickering target words" (line 208) in this study? While this may not be pivotal, as it primarily serves as an indicator to evaluate whether contextual congruity can indeed modulate the RIFT response rather than indicating early parafoveal semantic integration, I recommend that the authors explicitly address this point in the manuscript, maybe in the discussion section, to enhance reader comprehension of the article's rationale.

      Secondly, regarding the study's conclusions, there appears to be an overemphasis in stating that "semantic information ... can also be integrated with the sentence context ..." (line 21-22). As raised by Reviewer 2 (Major Point 1) and acknowledged by the authors in the limitations of the revised manuscript (lines 403-412), the RIFT effect observed likely stems from local congruency. Therefore, adjusting the conclusion to "integrated with previous context" may offer a more precise reflection of the findings.

      We appreciate the positive comments from the Reviewer.

      In response to the first concern, we have rephrased the sentence (Line 207-209 in the revised manuscript) to clarify that RIFT measure visual attention : “Moreover, as RIFT directly measures visual attention, the left-skewed RIFT response curve suggests that more visual attention is allocated towards the flickering target words before fixating on them, aligning with the left-to-right order of reading English.”

      Regarding the second concern, we have addressed the issue by modifying “sentence context” to “previous context” in both the Abstract (Line 18 and Line 22) and the Discussion section (Line 314 and Line 361) of the revised manuscript.

    1. Author response:

      We appreciate the comprehensive reviews and would like to address the critiques and suggestions provided by both reviewers. We will make significant revisions to the manuscript to address these concerns. These include a more cautious interpretation of our results, an expanded discussion on key findings, additional analyses for TRM characterization, and a clearer outline of future validation efforts. We believe these changes will enhance the clarity and robustness of our study, and we hope they meet the reviewer’s expectations.

      Reviewer 1:

      Weaknesses:

      (1) Heterogeneous and small cohort:

      Increasing the cohort size is not feasible due to resource constraints. We acknowledge the challenges posed by the heterogeneous and small cohort, which complicate adjustments for confounding. We will apply multiple testing corrections to transparently assess and accurately report the robustness of our findings in the revision.

      (2) Influence of tissue of origin on RNAseq:

      We agree that RNAseq results can be heavily influenced by the tissue of origin. While immune cell composition in the normal lung tissues and lymph nodes is quite different, we found that in tumor tissues and metastatic lymph nodes, these differences diminish and common features dominate. Although we depicted this data in the supplementary figure 1, we did not provide a quantitative test in the original submission. In the revision, we will perform additional quantitative tests to compare immune cell composition across different tissue origins. These tests will provide a more precise understanding of the cellular composition and support our argument regarding the similarity of tumor-sculpted microenvironment. We will include these results and detailed methodologies in the revision.

      (3) Accuracy performance and overfitting:

      We acknowledge the concern regarding the high “accuracy” performance potentially indicating overfitting. We will clarify the evaluation methods used and moderate our claims regarding accuracy in the revision.

      (4) Specificity of the tumor cell program/state analysis to the setting of ICIs:

      The comment suggests that the tumor programs in our study may not be specific to the ICI group but rather prognostic in lung cancer. We acknowledge this possibility as we performed comparisons between responders and non-responders (with different cut-offs) to find common trends and interpreted them in terms of their association with ICI. In the revision, we will test the prognostic association of the tumor programs using public lung cancer data.

      (5) More external validation needed:

      We recognize the importance of external validation for reproducibility. While increasing the cohort size is not feasible, we will propose future directions for validation using larger, independent cohorts and potential experimental validations.

      Reviewer 2:

      Weaknesses:

      (1) Small sample size and heterogeneous populations:

      Increasing the cohort size is not feasible due to resource constraints. We acknowledge the challenges posed by the heterogeneous and small cohort, which complicate adjustments for confounding. We will apply multiple testing corrections to transparently assess and accurately report the robustness of our findings in the revision.

      (2) Limited validation of signatures/ methods in independent cohorts:

      We recognize the importance of external validation for reproducibility. While increasing the cohort size is not feasible, we will propose future directions for validation using larger, independent cohorts and potential experimental validations.

      (3) Lack of functional characterization and discussion on key findings:

      We appreciate the feedback regarding the need for functional characterization and a more thorough discussion of key findings on the roles of specific cell populations and genes. In the revised manuscript, we will expand the discussion section to include in-depth analysis of these findings and their relevance to the study. This includes a detailed interpretation of how these factors contribute to the immune response and potential implications for therapy.

      (4) TRM findings and marker selection:

      We understand the concern regarding the association between TRM involvement in response to IO therapy, which appears counter to previous demonstrations. It is indeed important to note that we employed alternative markers for TRM characterization. Our choice of markers was based on transcriptional references relevant to our study. However, we agree that classical TRM markers such as CD69 and CD103, which were absent in our definition, are critical for accurate TRM identification. To address this, we will include a detailed rationale for our marker selection and acknowledge the limitations of our TRM characterization. We will include additional analyses using classical TRM markers where possible and incorporate these findings into the revision. This will provide a clearer understanding of our TRM population and its role in the immune response to IO therapy.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis. The evidence supporting the conclusions is compelling, although some additional experiments will strengthen the study. The work will be of interest to scientists in gastrointestinal research fields.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors showed that activation of RelA and Stat3 in hepatocytes of DSS-treated mice induced CYPs and thereby produced primary bile acids, particularly CDCA, which exacerbated intestinal inflammation.

      Strengths:

      This study reveals the RelA/Stat3-dependent gene program in the liver influences intestinal homeostasis.

      Our reply: We thank the reviewer for the positive feedback and for appreciating the strength of our study.

      Weaknesses:

      Additional evidence will strengthen the conclusion.

      (1) In Fig. 1C, photos show that phosphorylation of RelA and Stat3 was induced in only a few hepatocytes. The authors conclude that activation of both RelA and Stat3 induces inflammatory pathways. Therefore, the authors should show that phosphorylation of RelA and Stat3 is induced in the same hepatocytes during DSS treatment.

      Our reply: The reviewers have raised a pertinent issue in Figure 1, as later on in our study we suggest that the combined activation of Rela and Stat3 is critical for aggravating the colitogenic phenotype in the murine model.

      To address this issue, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies. Author response image 1 below shows the single staining for p-Rela (Ser536), pStat3 (Ser727), DAPI (to demarcate the nuclei) and merged image (p-Rela + pStat3).

      Author response image 1.

      Further, the signal intensity of p-RelA (Ser536) and p-Stat3(Ser727) per nuclei was calculated and plotted as a box plot. It is evident that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the presence of both p-Rela and p-Stat3 in the nuclei.

      Author response image 2.

      Further, we calculate the number of nuclei in the DSS-treated samples which are above the 90th percentile of the control samples (data has been provided in Author response table 1 below). We also calculate the percentage overlap of p-Rela to p-Stat3 and vice versa in Author response table 1 below.

      Author response table 1.

      Together our analysis concludes that indeed there is an activation of Rela and Stat3 in the same hepatocytes to generate the downstream effect that we observe in our study post-DSS treatment.

      (2) In Fig. 5, the authors treated mice with CDCA intraperitoneally. In this experiment, the concentration of CDCA in the colon of CDCA-treated mice should be shown.

      Our reply: We have experimentally examined if the CDCA supplemented intraperitoneally at the experimental dose used in our study, is reaching the colon or not. To quantify colonic CDCA we have performed targeted mass spectrometric studies and the data has been provided as a bar plot below.

      Author response image 3.

      It is evident from the plot that the CDCA levels are significantly higher in mice supplemented with CDCA as compared to their corresponding control (where only the vehicle was supplemented). The data has been added to the supplementary section S5b and the main text has been modified accordingly.

      Reviewer #2 (Public Review):

      Singh and colleagues employ a methodical approach to reveal the function of the transcription factors Rela and Stat3 in the regulation of the inflammatory response in the intestine.

      Strengths of the manuscript include the focus on the function of these transcription factors in hepatocytes and the discovery of their role in the systemic response to experimental colitis. While the systemic response to induce colitis is appreciated, the cellular and molecular mechanisms that drive such systemic response, especially those involving other organs beyond the intestine are an active area of research. As such, this study contributes to this conceptual advance. Additional strengths are the complementary biochemical and metabolomics approaches to describe the activation of these transcription factors in the liver and their requirement - specifically in hepatocytes - for the production of bile acids in response to colitis.

      Our reply: We express our gratitude to the reviewer for recognizing and appreciating the mechanistic insight provided by our work, and for considering it valuable in advancing conceptual understanding in the relevant field.

      Some weaknesses are noted in the presentation of the data, including a comprehensive representation of findings in all conditions and genotypes tested.

      Our reply: We thank the reviewer for the query and we have suitably modified the figures for a comprehensive representation of the findings, as described below:

      ● In Figure 2C, we have added the control alcian blue stained samples to clarify that there were no qualitative differences in the mucin levels observed in the relaΔhepstat3Δhep as compared to the wild type mice.

      ● We have also modified the figure 2D for a better presentation of the data.

      ● We have included histopathological analysis for the relaΔhepstat3Δhep mice in Figures S3a and S3b, following a format similar to the wild-type data previously provided as Figure S1a and S1b.

      ● For Figure 5C, the corresponding untreated samples with and without CDCA supplementation have been provided in the supplementary section Figure S5e.

      ● For Figure 2E, 3E, and 4C - the RT-qPCR data of the DSS-treated samples is plotted relative to their corresponding control samples, hence we only display two conditions in the bar plot. We have accordingly modified the figure legend for better clarity.

      Reviewer #3 (Public Review):

      Summary:

      The authors try to elucidate the molecular mechanisms underlying the intra-organ crosstalks that perpetuate intestinal permeability and inflammation.

      Strengths:

      This study identifies a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases via the gut-liver axis using both murine models and human samples.

      Our reply: We thank the reviewer for appreciating the therapeutic potential of our work.

      Weaknesses:

      (1) The mechanism by which DSS administration induces the activation of the Rela and Stat3 pathways and subsequent modification of the bile acid pathway remains clear. As the authors state, intestinal bacteria are one candidate, and this needs to be clarified. I recommend the authors investigate whether gut sterilization by administration of antibiotics or germ-free condition affects 1. the activation of the Rela and Stat3 pathway in the liver by DSS-treated WT mice and 2. the reduction of colitis in DSS-treated relaΔhepstat3Δhep mice.

      Our reply: We thank the reviewer for bringing up the aspect of gut microbiota in imparting colitis in our mice model. In accordance with reviewer's recommendation, we have sterilized the gut by administration of antibiotics, to evaluate if the intestinal bacteria are an important component leading to the activation of Rela and Stat3 pathway in the liver of DSS-treated WT mice or not.

      (a) A brief schematic representation of the experimental design has been provided below and the detailed description of the methods has been described in supplementary methods.

      Author response image 4.

      Extract of liver tissues from mice treated with DSS for 6 days with/without prior antibiotic treatment were probed with p-Stat3 (Ser727) to examine the activation status of the hepatic Stat3 pathway. We observe that the signals for p-Stat3 (Ser727) are comparatively reduced post antibiotic treatment as evident from the blot below. p-Stat3 (Ser727) was a prominent activation signal at Day 6 DSS treatment that we have observed in Figure 1D,E.

      Author response image 5.

      These studies suggest that the activation status of Stat3 activation is hampered by antibiotic treatment and considering that Rela and Stat3 have to coordinate activity, presumably the downstream activation will be modulated upon gut sterilization. However, it should be appreciated that a sterilized gut is not likely to be physiologically relevant and intestinal bacteria along with bile acid levels would modulate Rela/Stat3 pathways.

      b) It is likely that the hepatic deficiency of Rela and Stat3 may have modified the gut microbiome in relaΔhepstat3Δhep mice because of the altered bile composition. Moreover, the gut microbiota is a key component that guides the outcome of colitis. Hence, future studies are important to examine the role of the gut microbiome in imparting resistance in relaΔhepstat3Δhep mice, to colitogenic insults.

      (2) It has not been shown whether DSS administration causes an increase in primary bile acids, represented by CDCA, in the colon of WT mice following activation of the Rela and Stat3 pathways, as demonstrated in Figure 6.

      Our reply: In order to address the query, we would kindly like to request the reviewers to look at figure 4B where we show an increase in the CDCA levels of the colonic tissue, which is corresponding to our CDCA levels in the liver tissue (figure 4A) thus indicating that it may be driven by the hepatic Rela and Stat3 pathways.

      (3) The implications of these results for IBD treatment, especially in what ways they may lead to therapeutic intervention, need to be discussed.

      Our reply: We are grateful to the reviewer for bringing this topic for discussion.

      Until now, only immunosuppressive agents and immunomodulators have been conventionally considered as therapeutic measures to manage IBD. However, with increasing research on the role of hepatic bile acid metabolism during experimental colitis, its potential cannot be undermined in the clinical setting. The potential of bile acids as a therapeutic target has been harnessed in the past; bile acid sequestrants have been utilized as a treatment for hyperlipidemia 46. Remedies like fecal microbial transplantation, which serve to normalize the bile acid ratios in the gut, are emerging as potential therapeutics in the last decade for IBD 47, 40. However, the potential of altering hepatic bile metabolism has remained unexplored for IBD, possibly due to a lack of mechanistic insight. Towards this, our work demonstrates the pro-inflammatory potential of CDCA during colitis following the activation of the Rela/Stat3 pathway. The suppression of Rela/Stat3-induced CDCA could provide beneficial effects in IBD patients while protecting the basal bile acid levels (through FXR signaling). Thus our studies identify a hepatocyte-specific rela/stat3 network as a potential therapeutic target for intestinal diseases. Another approach could be the use of bile acid sequestrants, which will temporarily decrease the levels of primary bile acids in the colon until the proinflammatory pathways are dampened as a combinatorial therapy alongside existing treatments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor:

      Fig. 4C should be Fig. 4D and vice versa.

      Our reply: We have swapped Fig. 4C and Fig. 4D and corresponding changes have been incorporated in the main text.

      Reviewer #2 (Recommendations For The Authors):

      Please make note of the following specific comments

      The immunostainings for phosphorylated p-Rela and STAT3 are unclear. Is there nuclear translocation of these phosphorylated transcription factors? Can the authors enumerate the percentage of cells in which nuclear translocation (presumably in hepatocytes) is detected?

      Our reply: We apologize that immunostainings for phosphorylated p-Rela and STAT3 are unclear to the reviewers. Here we have tried our best to make the data clear by analyzing the stained section and plotting them.

      To start with, we have co-stained the fixed liver tissue of untreated and DSS-treated wild type mice with p-RelA (Ser536) and p-Stat3(Ser727) antibodies, below we have provided a representative image used for analysis. To demarcate the nuclear boundary of the hepatocytes DAPI was used and the signal intensity for p-RelA (Ser536) and p-Stat3(Ser727) was quantified using ZenBlue software.

      Author response image 6.

      Below we have provided the box plot for the calculated nuclear intensities in the control (untreated) and DSS-treated samples for p-Rela and p-Stat3. We can clearly see that the median of p-Rela and p-Stat3 signal intensity in DSS-treated samples is more than that of the control samples, suggesting that the majority of the treated hepatocytes have the translocation of p-Rela and p-Stat3 in their nuclei.

      Author response image 7.

      The figure legends for Figures 2C and D are flipped. Please correct.

      Our reply: Thank you for pointing it out, our apologies for the error and we have corrected the figure 2 accordingly.

      For all H&E stainings, the authors should include histological scoring disease severity.

      Our reply: Thank you for the query put forward, histological scoring to quantify the qualitative data obtained through microscopy is given below. Dot plot for the histological scoring of the H&E data for untreated and DSS-treated colon samples, we have referred to the scale described by Ren Y et al. 2019 (doi: 10.1038/s41598-019-53305-z) to score the sections.

      Author response image 8.

      We have added the dot plot to supplementary figure 2d, also the method applied for the above analysis has been described in the supplementary method section.

      Please include Alcian Blue Staining in non-DSS treated WT and rel/stat3 double cKO mice.

      Our reply: Thank you for pointing this out, we have added the Alcian Blue Staining of non-DSS treated WT and rel/stat3 double KO mice to figure 2C

      For Figure 3C, can the authors indicate in the figure itself which bile acid is being represented (not only in the Figure legend)?

      Our reply: Thank you for the suggestion we have indicated the respective bile acid in Figure 3C for better understanding.

      As these data are from untargeted metabolomics, were other bile acids detected?

      Our reply: This is a part of a separate study conducted by our collaborator, and will form a part of a new manuscript which will be focussed on human studies.

      Can the authors validate the downregulation of key enzymes shown in Figure 3D, E at the protein level?

      Our reply: We agree with the reviewer’s comment, that mRNA levels are not critical determinants of activation of any pathway, rather an indicator of probable activation. In that scenario, the estimation of protein levels is more determinative. But taking into consideration that we have the metabolomic data in subsequent figures (as in Figure 4 A, B) supporting our findings in Figure 3D, E, this makes RT-qPCR data a more robust indicator of an activated hepatic bile acid biosynthesis machinery.

      The figure legends for Figures 4C and D are flipped. Please correct.

      Our reply: Taking into consideration the suggestions by reviewer 1 we have swapped Fig. 4C and Fig. 4D and corrected the legend placement accordingly, thank you for pointing this out.

      Also, please include representative images for the data represented in 4C.

      Our reply: Thank you for the query, we have already added the representative images of confocal microscopy as figure S4.

      Figure 5B should indicate that the data presented is from double cKO mice.

      Our reply: We have indicated that the colon length data is from double KO animals in figure to make the visual representation clear for the readers, thank you for the concern.

      Please correct typos: "entrocytic" and "Untread" in Figure Legend 5.

      Our reply: Thank you for pointing out the error in the Legend, we apologize for the error in these errors we have corrected Figure 5.

      Figure S4 includes a dataset (qPCR for Mmp3) that is not described. Neither Figure S4 nor S5 are described in the text.

      Our reply: Thank you for the query, firstly we have already added Figure S4 and S5 to the text, our apologies that it has not been properly highlighted.

      Secondly, the data for RT-qPCR for Mmp3 has been removed from supplementary figures as it may not be very relevant to the study.

      Overall, the manuscript should be edited to ensure the correct use of English. Please also note that the last name of the first author seems to be missing in the main text.

      Our reply: Thank you for the suggestion we have re-checked the manuscript for the probable errors and rectified them. The first author has a single name (with no surname) and we would like to correct that during the final print of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors need to show if DSS treatment affects the serological or histological changes in the liver of relaΔhepstat3Δhep mice.

      Our reply: To address that, we have analyzed key serological markers of liver damage as well as looked into tissue histology.

      The pathophysiological parameters of the liver of DSS treated relaΔhepstat3Δhep mice has been added to the revised manuscript as figure S3a and S3b. Here we show that the serological parameters are within the physiological range upon DSS treatment (Author response image 9a). Besides, the histological parameters remain unaltered as compared to the control tissue (Author response image 9b).

      Cumulatively, both at the tissue level and functional level, there is not much effect of DSS

      treatment on liver of relaΔhepstat3Δhep mice.

      Author response image 9.

      (2) It is recommended to use a second model to verify if this phenomenon is applicable to colitic status in general.

      Our reply: We appreciate the query put forward, this is an ongoing study and we hope to examine further the role of hepatic RelA and Stat3 in TNBS-induced colitis model and in T cell transfer model of colitis.

    1. Author response

      The following is the authors’ response to the previous reviews

      eLife assessment 

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed with different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. There was concern, however, that a purely gene expression-based approach may not provide sufficient physiologically relevant information about plant responses to drought, and therefore, despite improvements from a previous version, the new methodology championed by this work remains inadequate.   

      Molecular biologists who study drought stress must make choices about which assays to use in their investigation. Serious resources and effort are put into their endeavor, and choice of assay matters. Our manuscript’s goal was largely practical: to guide molecular biologists employing transcriptomics in their choice of drought stress assay, and thus help ensure their work will discover transcriptional signatures of importance, and not those that may be an artifact from lowering water potential using chemical agents on agar plates.  

      We examine how different approaches of reducing water potential impact the Arabidopsis root and shoot transcriptome. Our manuscript shows that each method of reducing water potential has a different effect on Arabidopsis root transcriptome responses. We acknowledge that drought stress induces a complex physiological response, and can vary depending on the method used. However, by comparing across assays, we find instances where a gene is downregulated by low water potential in one assay, and upregulated by low water potential in another assay. We feel it is only natural to question why this could be, and to hypothesize that it may be caused by secondary effects caused by the way low water potential is imposed.  We note that comparative transcriptomics has been a standard approach for decades. We take it as the reviewer’s opinion that it may not be insightful, but it does not factually impact our findings. 

      Reviewer #2 (Public Review): 

      This manuscript purports to develop a new system to study low water potential (drought) stress responses in agar plates. They make numerous problematic comparisons among transcriptome datasets, particularly to transcriptome data from a vermiculite drying experiment which they inappropriately present as representing an authentic "drought response" to the exclusion of all other data. For some reason, which the reviewer cannot fully understand, the authors seem intent on asserting the superiority of their experimental system to all others. They do not succeed in this and such an effort is ultimately a disservice to the field of drought research as a whole. 

      While they devote considerable effort in comparing transcriptome data among various experimental systems, the potentially more informative experiment at the end of the manuscript of testing growth responses of a number of Arabidopsis accessions is only done for their "LW" system. The focus of this manuscript on transcriptome data to the almost complete exclusion of other types of data which is a symptom of a broader over-emphasis on transcriptome that unfortunately is quite prevalent in plant science now. It is worth reminding that for protein coding genes, which constitute the vast majority of genes, transcriptome data is a proxy measurement. The really important thing is protein amount, and even more so protein activity/function, which we know has an imperfect, at best, correlation with transcript level. We measure transcriptomes because we can, not because it is inherently the most informative thing to do. The author's quixotic quest to see if the transcriptomes of different stress treatments match is of limited value and further diminished by their misleading presentation of one particular transcriptome data set (from their vermiculite drying experiments) as somehow a special data set that everything else must be evaluated against. This study sheds no new light on how to do relevant drought (low water potential) experiments in the lab. 

      Although the reviewer acknowledges that the authors have made some effort to respond to previous comments, the fundamental flaws remain and the present version of this study is little improved from the first submission. 

      One challenge faced by the drought community is establishing consensus regarding the definition of drought itself. According to the criteria followed by the reviewer, any method leading to a reduction in water potential qualifies as drought stress. However, the findings presented in this manuscript demonstrate that transcriptional responses in roots vary considerably across five different methods of reducing water potential. This indicates that beyond responding to a change in water potential itself, root transcriptomes will also respond to the specific way low water potential is introduced. We believe this variability is of interest to the drought research community. 

      Of the five methods we explore, we hold the view that the gene expression changes induced by vermiculite drying as the most analogous to the expression signatures Arabidopsis would exhibit in response to low water potential in the natural environment. In contrast, we posit that Arabidopsis grown on agar plates - where the root system is exposed to air and light, and where water potential is lowered using chemical agents - may contain gene expression signatures plant molecular biologists may not find particularly relevant. However, we acknowledge that this is our opinion, and will make this more explicit on our revised text. 

      More broadly, we believe that the reviewer’s observation regarding the ‘over-emphasis’ on transcriptomics that is prevalent within the plant science community justifies, rather than diminishes, the work presented here. If transcriptomics is a commonly employed method, then we anticipate that the outcomes of this study will hold value for a broad audience. Such researchers are likely not only using transcriptomics as a proxy measure for protein abundance, as the reviewer suggests, but also because it is one of the more straightforward genomic techniques biologists can use to identify candidate genes that may be chosen for further scrutiny. 

      Reviewer #3 (Public Review): 

      Comments on revised version: 

      Specific previous criticisms that were addressed are: 

      (1) that gene expression changes were only compared between the highest dose of each stress assay. In the revised version, the authors changed their framework and are now using linear modelling to detect genes that display a dose response to each specific treatment. I agree that this might be a more robust approach to selecting genes that are specific to a certain treatment. 

      (2) that concentrations of PEG, mannitol, NaCl, and the "low water" agar which were chosen are not comparable in regards to their specific osmotic component. I appreciate that the authors measured the osmotic potential of each treatment. It revealed that both PEG and NaCl at their highest concentration had a much more negative osmotic potential compared to the other treatment. The authors claim that using ANCOVA they did not detect any significant differences between the treatments (lines 113, 114). I do believe that ANCOVA is not the appropriate test in this case. ANCOVA has an assumption of linearity, while the dose response between concentration and osmotic potential is non-linear. This is particularly evident for PEG (Steuter AA. Water potential of aqueous polyethylene glycol. Plant Physiol. 1981 Jan;67(1):64-7. doi: 10.1104/pp.67.1.64.). Since the treatments are not the same at the highest level, I think this could have effects on the validity of comparisons by linear model. One approach could be to remove the treatment level with the highest concentration and compare the results or adjust the treatments to the same osmolarity. 

      (3) that only two biological replicates were collected for RNA sequencing which makes it impossible to know how much variance exists between samples. The authors added a third replicate in the revised version for most treatments. However, some treatments still have only two replicates, which cannot be easily seen from the text or the figure. I would prefer that those differences are pointed out. 

      (4) that the original manuscript did not explore what effect the increase of agar and nutrient concentration in the "low water" agar had on water potentials. The authors conducted additional experiments showing that changes in water potential were exclusively caused by changes in the nutrient concentration (Figure 2-figure supplement 5; lines 222-224). However, the increase in agar strength had also some effect on gene expression. While this is not further discussed in the text, I believe this effect of agar on gene expression could be similar to root responses to soil compaction. 

      (5) That the lower volume of media in the "low water" agar could have an effect on plants. The authors compared these effects in Figure 2-figure supplement 7. They claim that "different volumes of LW agar media do not play a significant part in modulating gene expression". While I can see that they detected 313 overlapping DEGs, there were still 146 and 412 non-overlapping DEGs. The heatmap in subpanel E also shows that there were differences in particular in the up-regulated genes. My conclusion would be that the change in volume does play a role and this should be a consideration in the manuscript. 

      We thank the reviewer for their suggestions. We plan to resubmit the manuscript reflecting the requested changes. Specifically, we will: 

      -       We will detail more thoroughly the effects of agar volume on gene expression changes elicited by LW agar treatment. 

      -       We will investigate whether the tensile stress introduced by hard agar is similar to soil compaction by an analysis with existing literature. 

      -       Assess more rigorously the suitability of the ANCOVA model for assessing water potential changes of different media types.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The modeling process is outlined, but an explanation of why Maxent (Phillips & Dudík, 2008) was chosen for SDMs and why the specified predictor variables were used could provide additional context. This clarity would help readers understand the rationale behind the methodology.

      In L.558-571 (Predictor variables subsection), we added the explanation about predictor variables as follows:

      “Predictors encompass a range of environmental variables recognized to impact species distribution (Table 3): land use (Newbold et al., 2015), climate (bioclim variables (Booth et al., 2014)), vegetation (Abe, 2018), lithology (Ott, 2020) and elevational range (Udy et al., 2021). Additionally, categorical variables representing known biogeographic regions, reflecting geological history, were included. We applied  Blakiston's Line —Tsugaru straits dividing the northern and main islands of Japan (i.e., Hokkaido and Honshu islands)— reflecting a significant historical migration barrier for mammals and birds (Dobson, 1994; Saitoh et al., 2015). Due to the distinct fauna (Wepfer et al., 2016; Yamasaki, 2017), we also specified oceanic islands (i.e. Ogasawara and Daito isles) which have never been connected with the Asiatic continents. Continuous environmental variables were transformed into linear, quadratic and hinge feature classes to illustrate nonlinear associations between environments and species occurrence (Phillips et al., 2017). The regularisation multiplier was set at 2.5, falling within the established optimal range of 1.5 to 4 (Elith et al., 2010; MorenoAmat et al., 2015).

      In L.614-618 (Modelling subsection), we explain why we chose MaxEnt:

      “To model species distributions from presence-only data, several algorithms have been utilised, including generalised additive models, random forest, and neural networks (Norberg et al., 2019; Valavi et al., 2022). In our study, we opted for MaxEnt (Phillips and Dudík, 2008) due to its high estimation accuracy and relatively low computational burden (Valavi et al., 2022).

      (2) While the study outlines a manual reidentification process by experts for wild individuals, it might be beneficial to elaborate on the criteria or expertise level of these experts. This transparency ensures the reliability of the reidentification process. Reply

      In L.519-523, we added description about experts as follows:

      “These experts have professional backgrounds, serving as a technician at a prefectural research institute (fish), highly-experienced field survey conductors (plants and insects, respectively), a post-doctoral researchers (amphibians and reptiles, and mammals, respectively), and a museum curator (mollusks) specialising in the focal taxa.”

      (3) The analysis of the effects of data type (Biome+Traditional data or Traditional survey data) on BI is comprehensive. However, a brief discussion on the potential implications of these effects on the study's overall conclusions could add depth to the interpretation.

      We enforced our discussion about the causes and consequences of improved modelling accuracy. 

      In L.276-282, we argued about the causes: 

      “Therefore, incorporating Biome data could significantly enhance modelling accuracy in urban and suburban landscapes, which are typically underrepresented in traditional survey data. As pseudo-absences are selected based on search effort, our models utilise numerous pseudoabsences from these areas. Consequently, this might lead to better estimation of species absence in such areas, not just presence, resulting in an overall increase in model accuracy across a wider range of species.”  

      In L.370-387, we argued how improved modelling accuracy may help build naturepositive society as follows:

      “By blending data from traditional surveys and communities, we improved the accuracy of species distribution estimates. This enhanced estimation lays the groundwork for more precise subsequent analyses. For instance, estimated distributions will be useful in selecting new protected areas or areas with OECMs (Other Effective area-based Conservation Measures: allowing a wider range of land use as long as biodiversity and ecosystem services are sustained/improved). Using estimated distributions of each species, hotspots of species or evolutionary diverse taxa can be inferred. Such sites will be good candidates for protected areas (Jones et al., 2016) or OECMs (Shiono et al., 2021). Further, estimated distributions can be used as input for spatial conservation prioritisation tools (e.g. Marxan (Ball et al., 2009))

      In our experience, stakeholders—including corporate social responsibility managers and conservation practitioners—often seek the list of species potentially inhabiting their locations. Due to the uncertainty of SDMs and their thresholding into presence/absence, on-site surveys remain essential for assessing biodiversity status. SDMs can make such surveys costeffective by screening important locations for on-site assessment (e.g., Locate phase in TNFD framework) and narrowing down the target species for surveying. Improved estimation through SDMs can mitigate risks associated with their use in society and enable more informed decisionmaking for conservation efforts.”

      Following the editorial policy, we have reorganised our supplementary materials as follows:

      -        Formerly Supplementary File 1 - Remains unchanged.

      -        Formerly Supplementary File 2 - Transferred into the main text, in the subsection "Filtering suspicious occurrence record in Biome data" in the Methods section, and Table 2. Citations remain as Supplementary File 2.

      -        Formerly Supplementary File 3 - Remains unchanged.

      -        Formerly Supplementary File 4 - Transferred into "Figure 3—figure supplement 1".

      -        Formerly Supplementary File 5 - Transferred into Figure 4.

      -        Formerly Supplementary File 6 - Transferred into the main text, in the subsection "Predictor variables" in the Methods section and Table 3.

      -        Formerly Supplementary File 7 - Transferred into the main text, in the subsection "Pseudo-absence reflecting search effort" in the Methods section and Figure 5.

      -        Formerly Supplementary File 8 - Transferred into the main text, in the subsection "Model evaluation" in the Methods section and Figure 6.

      -        Formerly Supplementary File 9 - Renamed as Supplementary File 4.

    1. Author response:

      Reviewer #1 (Public Review):

      How does the brain respond to the input of different complexity, and does this ability to respond change with age?

      The study by Lalwani et al. tried to address this question by pulling together a number of neuroscientific methodologies (fMRI, MRS, drug challenge, perceptual psychophysics). A major strength of the paper is that it is backed up by robust sample sizes and careful choices in data analysis, translating into a more rigorous understanding of the sensory input as well as the neural metric. The authors apply a novel analysis method developed in human resting-state MRI data on task-based data in the visual cortex, specifically investigating the variability of neural response to stimuli of different levels of visual complexity. A subset of participants took part in a placebo-controlled drug challenge and functional neuroimaging. This experiment showed that increases in GABA have differential effects on participants with different baseline levels of GABA in the visual cortex, possibly modulating the perceptual performance in those with lower baseline GABA. A caveat is that no single cohort has taken part in all study elements, ie visual discrimination with drug challenge and neuroimaging. Hence the causal relationship is limited to the neural variability measure and does not extend to visual performance. Nevertheless, the consistent use of visual stimuli across approaches permits an exceptionally high level of comparability across (computational, behavioural, and fMRI are drawing from the same set of images) modalities. The conclusions that can be made on such a coherent data set are strong.

      The community will benefit from the technical advances, esp. the calculation of BOLD variability, in the study when described appropriately, encouraging further linkage between complementary measures of brain activity, neurochemistry, and signal processing.

      Thank you for your review. We agree that a future study with a single cohort would be an excellent follow-up.

      Reviewer #2 (Public Review):

      Lalwani et al. measured BOLD variability during the viewing of houses and faces in groups of young and old healthy adults and measured ventrovisual cortex GABA+ at rest using MR spectroscopy. The influence of the GABA-A agonist lorazepam on BOLD variability during task performance was also assessed, and baseline GABA+ levels were considered as a mediating variable. The relationship of local GABA to changes in variability in BOLD signal, and how both properties change with age, are important and interesting questions. The authors feature the following results: 1) younger adults exhibit greater task-dependent changes in BOLD variability and higher resting visual cortical GABA+ content than older adults, 2) greater BOLD variability scales with GABA+ levels across the combined age groups, 3) administration of a GABA-A agonist increased condition differences in BOLD variability in individuals with lower baseline GABA+ levels but decreased condition differences in BOLD variability in individuals with higher baseline GABA+ levels, and 4) resting GABA+ levels correlated with a measure of visual sensory ability derived from a set of discrimination tasks that incorporated a variety of stimulus categories.

      Strengths of the study design include the pharmacological manipulation for gauging a possible causal relationship between GABA activity and task-related adjustments in BOLD variability. The consideration of baseline GABA+ levels for interpreting this relationship is particularly valuable. The assessment of feature-richness across multiple visual stimulus categories provided support for the use of a single visual sensory factor score to examine individual differences in behavioral performance relative to age, GABA, and BOLD measurements.

      Weaknesses of the study include the absence of an interpretation of the physiological mechanisms that contribute to variability in BOLD signal, particularly for the chosen contrast that compared viewing houses with viewing faces.

      Whether any of the observed effects can be explained by patterns in mean BOLD signal, independent of variability would be useful to know.

      One of the first pre-processing steps of computing SDBOLD involves subtracting the block-mean from the fMRI signal for each task-condition. Therefore, patterns observed in BOLD signal variability are not driven by the mean-BOLD differences. Moreover, as noted above, to further confirm this, we performed additional mean-BOLD based analysis (See Supplementary Materials Pg 3). Results suggest that ∆⃗ MEANBOLD is actually larger in older adults vs. younger adults (∆⃗ SDBOLD exhibited the opposite pattern), but more importantly ∆⃗ MEANBOLD is not correlated with GABA or with visual performance. This is also consistent with prior research (Garrett et.al. 2011, 2013, 2015, 2020) that found MEANBOLD to be relatively insensitive to behavioral performance.

      The positive correlation between resting GABA+ levels and the task-condition effect on BOLD variability reaches significance at the total group level, when the young and old groups are combined, but not separately within each group. This correlation may be explained by age-related differences since younger adults had higher values than older adults for both types of measurements. This is not to suggest that the relationship is not meaningful or interesting, but that it may be conceptualized differently than presented.

      Thank you for this important point. The relationship between GABA and ∆⃗ SDBOLD shown in Figure 3 is also significant within each age-group separately (Line 386-388). The model used both age-group and GABA as predictors of ∆⃗ SDBOLD and found that both had a significant effect, while the Age-group x GABA interaction was not significant. The effect of age on ∆⃗ SDBOLD therefore does not completely explain the observed relationship between GABA and ∆⃗ SDBOLD because this latter effect is significant in both age-groups individually and in the whole sample even when variance explained by age is accounted for. The revision clarifies this important point (Ln 488-492). Thanks for raising it.

      Two separate dosages of lorazepam were used across individuals, but the details of why and how this was done are not provided, and the possible effects of the dose are not considered.

      Good point. We utilized two dosages to maximize our chances of finding a dosage that had a robust effect. The specific dosage was randomly assigned across participants and the dosage did not differ across age-groups or baseline GABA levels. We also controlled for the drug-dosage when examining the role of drug-related shift in ∆⃗ SDBOLD. We have clarified these points in the revision and highlighted the analysis that found no effect of dosage on drug-related shift in ∆⃗ SDBOLD (Line 407-418).

      The observation of greater BOLD variability during the viewing of houses than faces may be specific to these two behavioral conditions, and lingering questions about whether these effects generalize to other types of visual stimuli, or other non-visual behaviors, in old and young adults, limit the generalizability of the immediate findings.

      We agree that examining the factors that influence BOLD variability is an important topic for future research. In particular, although it is increasingly well known that variability modulation itself can occur in a host of different tasks and research contexts across the lifespan (see Garrett et al., 2013 Waschke et al., 2021), to address the question of whether variability modulation occurs directly in response to stimulus complexity in general, it will be important for future work to examine a range of stimulus categories beyond faces and houses. Doing so is indeed an active area of research in Dr. Garrett’s group, where visual stimuli from many different categories are examined (e.g., for a recent approach, see Waschke et.al.,2023 (biorxiv)). Regardless, only face and house stimuli were available in the current dataset. We therefore exploited the finding that BOLD variability tends to be larger for house stimuli than for face stimuli (in line with the HMAX model output) to demonstrate that the degree to which a given individual modulates BOLD variability in response to stimulus category is related to their age, to GABA levels, and to behavioral performance.

      The observed age-related differences in patterns of BOLD activity and ventrovisual cortex GABA+ levels along with the investigation of GABA-agonist effects in the context of baseline GABA+ levels are particularly valuable to the field, and merit follow-up. Assessing background neurochemical levels is generally important for understanding individualized drug effects. Therefore, the data are particularly useful in the fields of aging, neuroimaging, and vision research.

      Thank you, we agree!

      Reviewer #3 (Public Review):

      The role of neural variability in various cognitive functions is one of the focal contentions in systems and computational neuroscience. In this study, the authors used a largescale cohort dataset to investigate the relationship between neural variability measured by fMRI and several factors, including stimulus complexity, GABA levels, aging, and visual performance. Such investigations are valuable because neural variability, as an important topic, is by far mostly studied within animal neurophysiology. There is little evidence in humans. Also, the conclusions are built on a large-scale cohort dataset that includes multi-model data. Such a dataset per se is a big advantage. Pharmacological manipulations and MRS acquisitions are rare in this line of research. Overall, I think this study is well-designed, and the manuscript reads well. I listed my comments below and hope my suggestions can further improve the paper.

      Strength:

      1). The study design is astonishingly rich. The authors used task-based fMRI, MRS technique, population contrast (aging vs. control), and psychophysical testing. I appreciate the motivation and efforts for collecting such a rich dataset.

      2) The MRS part is good. I am not an expert in MRS so cannot comment on MRS data acquisition and analyses. But I think linking neural variability to GABA in humans is in general a good idea. There has been a long interest in the cause of neural variability, and inhibition of local neural circuits has been hypothesized as one of the key factors. 3. The pharmacological manipulation is particularly interesting as it provides at least evidence for the causal effects of GABA and deltaSDBOLD. I think this is quite novel.

      Weakness:

      1) I am concerned about the definition of neural variability. In electrophysiological studies, neural variability can be defined as Poisson-like spike count variability. In the fMRI world, however, there is no consensus on what neural variability is. There are at least three definitions. One is the variability (e.g., std) of the voxel response time series as used here and in the resting fMRI world. The second is to regress out the stimulusevoked activation and only calculate the std of residuals (e.g., background variability). The third is to calculate variability of trial-by-trial variability of beta estimates of general linear modeling. It currently remains unclear the relations between these three types of variability with other factors. It also remains unclear the links between neuronal variability and voxel variability. I don't think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.

      These are very important points, thank you for raising them. Although we agree that the majority of the single cell electrophysiology world indeed seems to prefer Poisson-like spiking variability as an easy and tractable estimate, it is certainly not the only variability approach in that field (e.g., entropy; see our most recent work in humans where spiking entropy outperforms simple spike counts to predict memory performance; Waschke et al., 2023, bioRxiv). In LFP, EEG/MEG and fMRI, there is indeed no singular consensus on what variability “is”, and in our opinion, that is a good thing. We have reported at length in past work about entire families of measures of signal variability, from simple variance, to power, to entropy, and beyond (see Table 1 in Waschke et al, 2021, Neuron). In principle, these measures are quite complementary, obviating the need to establish any single-measure consensus per se. Rather than viewing the three measures of neural variability that the reviewer mentioned as competing definitions, we prefer to view them as different sources of variance. For example, from each of the three sources of variance the reviewer suggests, any number of variability measures could be computed.

      The current study focuses on using the standard deviation of concatenated blocked time series separately for face and house viewing conditions (this is the same estimation approach used in our very earliest studies on signal variability; Garrett et al., 2010, JNeurosci). In those early studies, and nearly every one thereafter (see Waschke et al., 2021, Neuron), there is no ostensible link between SDBOLD (as we normaly compute it) and average BOLD from either multivariate or GLM models; as such, we do not find any clear difference in SDBOLD results whether or not average “evoked” responses are removed or not in past work. This is perhaps also why removing ERPs from EEG time series rarely influences estimates of variability in our work (e.g., Kloosterman et al., 2020, eLife).

      The third definition the reviewer notes refers to variability of beta estimates over trials. Our most recent work has done exactly this (e.g., Skowron et al., 2023, bioRxiv), calculating the SD even over single time point-wise beta estimates so that we may better control the extraction of time points prior to variability estimation. Although direct comparisons have not yet been published by us, variability over single TR beta estimates and variability over the time series without beta estimation are very highly correlated in our work (in the .80 range; e.g., Kloosterman et al., in prep).

      Re: the reviewer’s point that “It also remains unclear the links between neuronal variability and voxel variability. I don’t think the computational principles discovered in neuronal variability also apply to voxel responses. I hope the authors can acknowledge their differences and discuss their differences.” If we understand correctly, the reviewer maybe asking about within-person links between single-cell neuronal variability (to allow Poisson-like spiking variability) and voxel variability in fMRI? No such study has been conducted to date to our knowledge (such data almost don’t exist). Or rather, perhaps the reviewer is noting a more general point regarding the “computational principles” of variability in these different domains? If that is true, then a few points are worth noting. First, there is absolutely no expectation of Poisson distributions in continuous brain imaging-based time series (LFP, E/MEG, fMRI). To our knowledge, such distributions (which have equivalent means and variances, allowing e.g., Fano factors to be estimated) are mathematically possible in spiking because of the binary nature of spikes; when mean rates rise, so too do variances given that activity pushes away from the floor (of no activity). In continuous time signals, there is no effective “zero”, so a mathematical floor does not exist outright. This is likely why means and variances are not well coupled in continuous time signals (see Garrett et al., 2013, NBR; Waschke et al., 2021, Neuron); anything can happen. Regardless, convergence is beginning to be revealed between the effects noted from spiking and continuous time estimates of variability. For example, we show that spiking variability can show a similar, behaviourally relevant coupling to the complexity of visual input (Waschke et al., 2023, bioRxiv) as seen in the current study and in past work (e.g., Garrett et al., 2020, NeuroImage). Whether such convergence reflects common computational principles of variability remains to be seen in future work, despite known associations between single cell recordings and BOLD overall (e.g., Logothetis and colleagues, 2001, 2002, 2004, 2008).

      Given the intricacies of these arguments, we don’t currently include this discussion in the revised text. However, we would be happy to include aspects of this content in the main paper if the reviewer sees fit.

      2) If I understand it correctly, the positive relationship between stimulus complexity and voxel variability has been found in the author's previous work. Thus, the claims in the abstract in lines 14-15, and section 1 in results are exaggerated. The results simply replicate the findings in the previous work. This should be clearly stated.

      Good point. Since this finding was a replication and an extension, we reported these results mostly in the supplementary materials. The stimulus set used for the current study is different than Garrett et.al. 2020 and therefore a replication is important. Moreover, we have extended these findings across young and older adults (previous work was based on older adults alone). We have modified the text to clarify what is a replication and what part are extension/novel about the current study now (Line 14, 345 and 467). Thanks for the suggestion.

      3) It is difficult for me to comprehend the U-shaped account of baseline GABA and shift in deltaSDBOLD. If deltaSDBOLD per se is good, as evidenced by the positive relationship between brainscore and visual sensitivity as shown in Fig. 5b and the discussion in lines 432-440, why the brain should decrease deltaSDBOLD ?? or did I miss something? I understand that "average is good, outliers are bad". But a more detailed theory is needed to account for such effects.

      When GABA levels are increased beyond optimal levels, neuronal firing rates are reduced, effectively dampening neural activity and limiting dynamic range; in the present study, this resulted in reduced ∆⃗ SDBOLD. Thus, the observed drug-related decrease in ∆⃗ SDBOLD was most present in participants with already high levels of GABA. We have now added an explanation for the expected inverted-U (Line 523-546). The following figure tries to explain this with a hypothetical curve diagram and how different parts of Fig 4 might be linked to different points in such a curve.

      Author response image 1.

      Line 523-546 – “We found in humans that the drug-related shift in ∆⃗ SDBOLD could be either positive or negative, while being negatively related to baseline GABA. Thus, boosting GABA activity with drug during visual processing in participants with lower baseline GABA levels and low levels of ∆⃗ SDBOLD resulted in an increase in ∆⃗ SDBOLD (i.e., a positive change in ∆⃗ SDBOLD on drug compared to off drug). However, in participants with higher baseline GABA levels and higher ∆⃗ SDBOLD, when GABA was increased presumably beyond optimal levels, participants experienced no-change or even a decrease in∆⃗ SDBOLD on drug. These findings thus provide the first evidence in humans for an inverted-U account of how GABA may link to variability modulation.

      Boosting low GABA levels in older adults helps increase ∆⃗ SDBOLD, but why does increasing GABA levels lead to reduced ∆⃗ SDBOLD in others? One explanation is that higher than optimal levels of inhibition in a neuronal system can lead to dampening of the entire network. The reduced neuronal firing decreases the number of states the network can visit and decreases the dynamic range of the network. Indeed, some anesthetics work by increasing GABA activity (for example propofol a general anesthetic modulates activity at GABAA receptors) and GABA is known for its sedative properties. Previous research showed that propofol leads to a steeper power spectral slope (a measure of the “construction” of signal variance) in monkey ECoG recordings (Gao et al., 2017). Networks function optimally only when dynamics are stabilized by sufficient inhibition. Thus, there is an inverted-U relationship between ∆⃗ SDBOLD and GABA that is similar to that observed with other neurotransmitters.”

      4) Related to the 3rd question, can you show the relationship between the shift of deltaSDBOLD (i.e., the delta of deltaSDBOLD) and visual performance?

      We did not have data on visual performance from the same participants that completed the drug-based part of the study (Subset1 vs 3; see Figure 1); therefore, we unfortunately cannot directly investigate the relationship between the drug-related shift of ∆⃗ SDBOLD and visual performance. We have now highlighted that this as a limitation of the current study (Line 589-592), where we state: One limitation of the current study is that participants who received the drug-manipulation did not complete the visual discrimination task, thus we could not directly assess how the drug-related change in ∆⃗ SDBOLD impacted visual performance.

      5) Are the dataset openly available?? I didn't find the data availability statement.

      An excel-sheet with all the processed data to reproduce figures and results has been included in source data submitted along with the manuscript along with a data dictionary key for various columns. The raw MRI, MRS and fMRI data used in the current manuscript was collected as a part of a larger (MIND) study and will eventually be made publicly available on completion of the study (around 2027). Before that time, the raw data can be obtained for research purposes upon reasonable request. Processing code will be made available on GitHub.

    1. Author response:

      Reviewer #1 (Public Review):

      Reviewer #1, comment #1: The study is thorough and systematic, and in comparing three well-separated hypotheses about the mechanism leading from grid cells to hexasymmetry it takes a neutral stand above the fray which is to be particularly appreciated. Further, alternative models are considered for the most important additional factor, the type of trajectory taken by the agent whose neural activity is being recorded. Different sets of values, including both "ideal" and "realistic" ones, are considered for the parameters most relevant to each hypothesis. Each of the three hypotheses is found to be viable under some conditions, and less so in others. Having thus given a fair chance to each hypothesis, nevertheless, the study reaches the clear conclusion that the first one, based on conjunctive grid-by-head-direction cells, is much more plausible overall; the hypothesis based on firing rate adaptation has intermediate but rather weak plausibility; and the one based on clustering of cells with similar spatial phases in practice would not really work. I find this conclusion convincing, and the procedure to reach it, a fair comparison, to be the major strength of the study.

      Response: Thanks for your positive assessment of our manuscript.

      Reviewer #1, comment #2: What I find less convincing is the implicit a priori discarding of a fourth hypothesis, that is, that the hexasymmetry is unrelated to the presence of grid cells. Full disclosure: we have tried unsuccessfully to detect hexasymmetry in the EEG signal from vowel space and did not find any (Kaya, Soltanipour and Treves, 2020), so I may be ranting off my disappointment, here. I feel, however, that this fourth hypothesis should be at least aired, for a number of reasons. One is that a hexasymmetry signal has been reported also from several other cortical areas, beyond entorhinal cortex (Constantinescu et al, 2016); true, also grid cells in rodents have been reported in other cortical areas as well (Long and Zhang, 2021; Long et al, bioRxiv, 2021), but the exact phenomenology remains to be confirmed.

      Response: Thank you for the suggestion to add the hypothesis that the neural hexasymmetry observed in previous fMRI and intracranial EEG studies may be unrelated to grid cells. Following your suggestion, we have now mentioned at the end of the fourth paragraph of the Introduction that “the conjunctive grid by head-direction cell hypothesis does not necessarily depend on an alignment between the preferred head directions with the grid axes”. Furthermore, at the end of section “Potential mechanisms underlying hexadirectional population signals in the entorhinal cortex” (in the Discussion) we write: “However, none of the three hypotheses described here may be true and another mechanism may explain macroscopic grid-like representations. This includes the possibility that neural hexasymmetry is completely unrelated to grid-cell activity, previously summarized as the ‘independence hypothesis' (Kunz et al., 2019). For example, a population of head-direction cells whose preferred head directions occur at offsets of 60 degrees from each other could result in neural hexasymmetry in the absence of grid cells. The conjunctive grid by head-direction cell hypothesis thus also works without grid cells, which may explain why grid-like representations have been observed (using fMRI) in regions outside the entorhinal cortex, where rodent studies have not yet identified grid cells (Doeller et al., 2010; Constantinescu et al., 2016). In that case, however, another mechanism would be needed that could explain why the preferred head directions of different head-direction cells occur at multiples of 60 degrees. Attractor-network structures may be involved in such a mechanism, but this remains speculative at the current stage.” We now also mention the results from Long and Zhang (second paragraph of the Introduction): “Surprisingly, grid cells have also been observed in the primary somatosensory cortex in foraging rats (Long and Zhang, 2021).”

      Regarding your EEG study, we have added a reference to it in the manuscript and state that it is an example for a study that did not find evidence for neural hexasymmetry (end of first paragraph of the Discussion): “We note though that some studies did not find evidence for neural hexasymmetry. For example, a surface EEG study with participants “navigating” through an abstract vowel space did not observe hexasymmetry in the EEG signal as a function of the participants’ movement direction through vowel space (Kaya et al., 2020). Another fMRI study did not find evidence for grid-like representations in the ventromedial prefrontal cortex while participants performed value-based decision making (Lee et al., 2021). This raises the question whether the detection of macroscopic grid-like representations is limited to some recording techniques (e.g., fMRI and iEEG but not surface EEG) and to what extent they are present in different tasks.”

      Reviewer #1, comment #3: Second, as the authors note, the conjunctive mechanism is based on the tight coupling of a narrow head direction selectivity to one of the grid axes. They compare "ideal" with "Doeller" parameters, but to me the "Doeller" ones appear rather narrower than commonly observed and, crucially, they are applied to all cells in the simulations, whereas in reality only a proportion of cells in mEC are reported to be grid cells, only a proportion of them to be conjunctive, and only some of these to be narrowly conjunctive. Further, Gerlei et al (2020) find that conjunctive grid cells may have each of their fields modulated by different head directions, a truly surprising phenomenon that, if extensive, seems to me to cast doubts on the relation between mass activity hexasymmetry and single grid cells.

      Response: We have revised the manuscript in several ways to address the different aspects of this comment.

      Firstly, we agree with the reviewer that our “Doeller” parameter for the tuning width is narrower than commonly observed. We have therefore reevaluated the concentration parameter κ_c in the ‘realistic’ case from 10 rad-2 (corresponding to a tuning width of 18o) to 4 rad-2 (corresponding to a tuning width of 29o). We chose this value by referring to Supplementary Figure 3 of Doeller et al. (2010). In their figure, the tuning curves usually cover between one sixth and one third of a circle. Since stronger head-direction tuning contributes the most to the resulting hexasymmetry, we chose a value of κ_c=4 for the tuning parameter, which corresponds to a tuning width (= half width) of 29o (full width of roughly one sixth of a circle). Regarding the coupling of the preferred head directions to the grid axes, the specific value of the jitter σc = 3 degrees that quantifies the coupling of the head-direction preference to the grid axes was extracted from the 95% confidence interval given in the third row of the Table in Supplementary Figure 5b of Doeller et al. 2010. We now better explain the origin of these values in our new Methods section “Parameter estimation” and provide an overview of all parameter values in Table 1.

      Furthermore, in response to your comment, we have revised Figure 2E to show neural hexasymmetries for a larger range of values of the jitter (σc from 0 to 30 degrees), going way beyond the values that Doeller et al. suggested. We have also added a new supplementary figure (Figure 2 – figure supplement 1) where we further extend the range of tuning widths (parameter κ_c) to 60 degrees. This provides the reader with a comprehensive understanding of what parameter values are needed to reach a particular hexasymmetry.

      Regarding your comments on the prevalence of conjunctive grid by head-direction cells, we have revised the manuscript to make it explicit that the actual percentage of conjunctive cells with the necessary properties may be low in the entorhinal cortex (first paragraph of section “A note on our choice of the values of model parameters” of the Discussion): “Empirical studies in rodents found a wide range of tuning widths among grid cells ranging from broad to narrow (Doeller et al., 2010; Sargolini et al., 2006). The percentage of conjunctive cells in the entorhinal cortex with a sufficiently narrow tuning may thus be low. Such distributions (with a proportionally small amount of narrowly tuned conjunctive cells) lead to low values in the absolute hexasymmetry. The neural hexasymmetry in this case would be driven by the subset of cells with sufficiently narrow tuning widths. If this causes the neural hexasymmetry to drop below noise levels, the statistical evaluation of this hypothesis would change.” In addition, in Figure 5, we have applied the coupling between preferred head directions and grid axes to only one third of all grid cells (parameter pc= ⅓ in Table 1), following the values reported by Boccara et al. 2010 and Sargolini et al. 2006. To strengthen the link between Figure 5 and Figure 2, we now state the hexasymmetry when using pc= ⅓ along with a ‘realistic’ tuning width and jitter for head-direction modulated grid cells in Figure 2H. Additionally, we performed new simulations where we observed a linear relationship (above the noise floor) between the proportion of conjunctive cells and the hexasymmetry. This shall help the reader understand the effect of a reduced percentage of conjunctive cells on the absolute hexasymmetry values. We have added these results as a new supplementary figure (Figure 2 – figure supplement 2).

      Finally, regarding your comment on the findings by Gerlei et al. 2020, we now reference this study in our manuscript and discuss the possible implications (second paragraph of section “A note on our choice of the values of model parameters” of the Discussion): “Additionally, while we assumed that all conjunctive grid cells maintain the same preferred head direction between different firing fields, conjunctive grid cells have also been shown to exhibit different preferred head directions in different firing fields (Gerlei et al., 2020). This could lead to hexadirectional modulation if the different preferred head directions are offset by 60o from each other, but will not give rise to hexadirectional modulation if the preferred head directions are randomly distributed. To the best of our knowledge, the distribution of preferred head directions was not quantified by Gerlei et al. (2020), thus this remains an open question.”

      Reviewer #1, comment #4: Finally, a variant of the fourth hypothesis is that the hexasymmetry might be produced by a clustering of head direction preferences across head direction cells similar to that hypothesized in the first hypothesis, but without such cells having to fire in grid patterns. If head direction selectivity is so clustered, who needs the grids? This would explain why hexasymmetry is ubiquitous, and could easily be explored computationally by, in fact, a simplification of the models considered in this study.

      Response: We fully agree with you. We now explain this possibility in the Introduction where we introduce the conjunctive grid by head-direction cell hypothesis (fourth paragraph of the Introduction) and return to it in the Discussion (section “Potential mechanisms underlying hexadirectional population signals in the entorhinal cortex”). There, we now also explain that in such a case another mechanism would be needed to ensure that the preferred head directions of head-direction cells exhibit six-fold rotational symmetry.

      Reviewer #2 (Public Review):

      Reviewer #2, comment #1: Grid cells - originally discovered in single-cell recordings from the rodent entorhinal cortex, and subsequently identified in single-cell recordings from the human brain - are believed to contribute to a range of cognitive functions including spatial navigation, long-term memory function, and inferential reasoning. Following a landmark study by Doeller et al. (Nature, 2010), a plethora of human neuroimaging studies have hypothesised that grid cell population activity might also be reflected in the six-fold (or 'hexadirectional') modulation of the BOLD signal (following the six-fold rotational symmetry exhibited by individual grid cell firing patterns), or in the amplitude of oscillatory activity recorded using MEG or intracranial EEG. The mechanism by which these network-level dynamics might arise from the firing patterns of individual grid cells remains unclear, however.

      In this study, Khalid and colleagues use a combination of computational modelling and mathematical analysis to evaluate three competing hypotheses that describe how the hexadirectional modulation of population firing rates (taken as a simple proxy for the BOLD, MEG, or iEEG signal) might arise from the firing patterns of individual grid cells. They demonstrate that all three mechanisms could account for these network-level dynamics if a specific set of conditions relating to the agent's movement trajectory and the underlying properties of grid cell firing patterns are satisfied.

      The computational modelling and mathematic analyses presented here are rigorous, clearly motivated, and intuitively described. In addition, these results are important both for the interpretation of hexadirectional modulation in existing data sets and for the design of future experiments and analyses that aim to probe grid cell population activity. As such, this study is likely to have a significant impact on the field by providing a firmer theoretical basis for the interpretation of neuroimaging data. To my mind, the only weakness is the relatively limited focus on the known properties of grid cells in rodent entorhinal cortex, and the network level activity that these firing patterns might be expected to produce under each hypothesis. Strengthening the link with existing neurobiology would further enhance the importance of these results for those hoping to assay grid cell firing patterns in recordings of ensemble-level neural activity.

      Response: Thank you very much for reviewing our manuscript and your positive assessment. Following your comments, we have revised the manuscript to more closely link our simulations to known properties of grid cells in the rodent entorhinal cortex.

      Reviewer #3 (Public Review):

      Reviewer #3, comment #1: This is an interesting and carefully carried out theoretical analysis of potential explanations for hexadirectional modulation of neural population activity that has been reported in the human entorhinal cortex and some other cortical regions. The previously reported hexadirectional modulation is of considerable interest as it has been proposed to be a proxy for the activation of grid cell networks. However, the extent to which this proposal is consistent with the known firing properties of grids hasn't received the attention it perhaps deserves. By comparing the predictions of three different models this study imposes constraints on possible mechanisms and generates predictions that can be tested through future experimentation.

      Overall, while the conclusions of the study are convincing, I think the usefulness to the field would be increased if null hypotheses were more carefully considered and if the authors' new metric for hexadirectional modulation (H) could be directly contrasted with previously used metrics. For example, if the effect sizes for hexadirectional modulation in the previous fMRI and EEG data could be more directly compared with those of the models here, then this could help in establishing the extent to which the experimental hexadirectional modulation stands out from path hexasymmetry and how close it comes to the striking modulation observed with the conjunctive models. It could also be helpful to consider scenarios in which hexadirectional modulation is independent of grid firing, for example perhaps with appropriate coordination of head direction cell firing.

      Response: Thanks for reviewing our manuscript and for the overall positive assessment. The new Methods section “Implementation of previously used metrics” starts with the following sentences: “We applied three previously used metrics to our framework: the Generalized Linear Model (GLM) method by Doeller et al. 2010; the GLM method with binning by Kunz et al. 2015; and the circular-linear correlation method by Maidenbaum et al. 2018.” We have created a new supplementary figure (Figure 5 – figure supplement 4) in which we compare the results from these other methods to the results of our new method. Overall, the results are highly similar, indicating that all these methods are equally suited to test for a hexadirectional modulation of neural activity.

      In section “Implementation of previously used metrics” we then explain: “In brief, in the GLM method (e.g. used in Doeller et al., 2010), the hexasymmetry is found in two steps: the orientation of the hexadirectional modulation is first estimated on the first half of the data by using the regressors and on the time-discrete fMRI activity (Equation 9), with θt being the movement direction of the subject in time step t. The amplitude of the signal is then estimated on the second half of the data using the single regressor , where . The hexasymmetry is then evaluated as .

      The GLM method with binning (e.g. used in Kunz et al., 2015) uses the same procedure as the GLM method for estimating the grid orientation in the first half of the data, but the amplitude is estimated differently on the second half by a regressor that has a value 1 if θt is aligned with a peak of the hexadirectional modulation (aligned if , modulo operator) and a value of -1 if θt is misaligned. The hexasymmetry is then calculated from the amplitude in the same way as in the GLM method.

      The circular-linear correlation method (e.g. used in Maidenbaum et al., 2018) is similar to the GLM method in that it uses the regressors β1 cos(6θ_t) and β2 on the time-discrete mean activity, but instead of using β1 and β2 to estimate the orientation of the hexadirectional modulation, the beta values are directly used to estimate the hexasymmetry using the relation .”

      For each of the three previously used metrics and our new method, we estimated the resulting hexasymmetry (new Figure 5 – figure supplement 4 in the manuscript). In the Methods section “Implementation of previously used metrics” we then continue with our explanations: “Regarding the statistical evaluation, each method evaluates the size of the neural hexasymmetry differently. Specifically, the new method developed in our manuscript compares the neural hexasymmetry to path hexasymmetry to test whether neural hexasymmetry is significantly above path hexasymmetry. For the two generalized linear model (GLM) methods, we compare the hexasymmetry to zero (using the Mann-Whitney U test) to establish significance. Hexasymmetry values can be negative in these approaches, allowing the statistical comparison against 0. Negative values occur when the estimated grid orientation from the first data half does not match the grid orientation from the second data half. Regarding the statistical evaluation of the circular-linear correlation method, we calculated a z-score by comparing each empirical observation of the hexasymmetry to hexasymmetries from a set of surrogate distributions (as in Maidenbaum et al., 2018). We then calculate a p-value by comparing the distribution of z-scores versus zero using a Mann-Whitney U test. We use the z-scores instead of the hexasymmetry for the circular-linear correlation method to match the procedure used in Maidenbaum et al. (2018). We obtained the surrogate distributions by circularly shifting the vector of movement directions relative to the time dependent vector of firing rates. For random walks, the vector is shifted by a random number drawn from a uniform distribution defined with the same length as the number of time points in the vector of movement directions. For the star-like walks and piecewise linear walks, the shift is a random integer multiplied by the number of time points in a linear segment. Circularly shifting the vector of movement directions scrambles the correlations between movement direction and neural activity while preserving their temporal structure.”

      The results of these simulations, i.e. the comparison of our new method to previously used metrics, are summarized in Figure 5 – figure supplement 4 and show qualitatively identical findings when using the different methods. We have added this information also to the manuscript in the third paragraph of section “Quantification of hexasymmetry of neural activity and trajectories” of the Methods: “Empirical (fMRI/iEEG) studies (e.g. Doeller et al., 2010; Kunz et al., 2015; Maidenbaum et al., 2018) addressed this problem of trajectories spuriously contributing to hexasymmetry by fitting a Generalized Linear Model (GLM) to the time discrete fMRI/iEEG activity. In contrast, our new approach to hexasymmetry in Equation (12) quantifies the contribution of the path to the neural hexasymmetry explicitly, and has the advantage that it allows an analytical treatment (see next section). Comparing our new method with previous methods for evaluating hexasymmetry led to qualitatively identical statistical effects (Figure 5 – figure supplement 4).” We have also added a pointer to this new supplementary figure in the caption of Figure 5 in the manuscript: “For a comparison between our method and previously used methods for evaluating hexasymmetry, see Figure 5 – figure supplement 4.”

    1. Author response:

      Reviewer #1 (Public Review):

      Metabotropic glutamate receptors (mGLuRs) play a key role in regulating neuronal activity and related behaviors. In different brain regions these receptors can be expressed presynaptically and postsynaptically in different classes of neurons. Therefore, it is difficult to predict the effects of systemically applied drugs that act on these receptors. Here, the authors harness the power of photopharmacology, applying modulators that can be activated or inactivated by light with spatial precision, to address this problem. Their stated goal is to determine the role of mGluRs in regulating pain behaviors, and the circuit mechanisms driving this regulation. Their findings suggest that mGluRs acting in medial prefrontal cortex and thalamus drive antinociception in animals with neuropathic pain, whereas these receptors drive pronociception when acting in the amygdala. Their circuit analysis suggests that, in the amygdala, mGluRs act by decreasing feedforward inhibition of the output neurons. These findings have the potential to affect the development of targeted treatment for pain and related disorders. The elegant photopharmacological approaches will likely inform future studies attempting to distinguish the action of neuroactive drugs in different brain regions.

      We thank the reviewer for the insightful evaluation of our study.

      Reducing the impact of these studies are several methodological, analytical, and interpretation issues.

      The authors report that "the effect of optical manipulations of photosensitive mGlu5 NAMs in individual brain regions in pain models has been studied before". It is, therefore, not immediately clear what is novel in the present study.

      We have clarified this in the following statement (page 3, lines 15‐17): “It remains to be determined if region‐specific actions play a role in the overall analgesic activity of mGlu5 receptor NAMs, considering that opposite actions have been reported”. The subsequent paragraph nicely explains the novelty of our approach, which is based on the combined use of a drug activated by light (JF‐NP‐26) and another drug inactivated by light (alloswitch‐1) to determine which region is sufficient and/or necessary for the analgesic effect of systemic mGlu5 receptor NAMs. In the Discussion (page 7) we state that “To the best of our knowledge, this is the first study to employ photopharmacological tools to compare and contrast distinct roles of mGlu5 receptors in different regions of the pain matrix”.

      The reliance only on reflexive measures of pain, especially in a study that examines the role of "affective and cognitive aspects of pain and pain modulation".

      The main endpoint of the study was not to examine the cognitive and affective aspects of pain, although some of the regions examined are involved in these aspects of pain besides the regulation of sensory aspects (pain thresholds). However, we followed the kind suggestion and measured depression‐like and risk‐taking (anxiety‐like) behaviors in mice. To optimize the number of mice and be still consistent with the number of mice approved by the regulatory agency we used the following groups of mice for the evaluation of risk‐taking behavior with the light‐dark box: (i) sham‐operated mice treated with vehicle; (ii) CCI mice treated with vehicle; (iii) CCI mice treated with JF‐NP‐26 without light activation; and (iv) CCI mice treated with JF‐NP‐26 and irradiated with activating light (the test cannot be performed in the same mice before and after light activation to avoid habituation); depression‐like behavior with the tail suspension test was performed in two separate groups of mice: (i) CCI mice treated with JF‐NP‐26 with no light; and (ii) CCI mice treated with JF‐NP‐26 and light activation. All mice had been implanted with optic fibers in the basolateral amygdala.

      Data are shown in the new Supplementary Fig. S4 and reported in the Results section (page 5) as follows: “Knowing that mGlu5 receptors in the BLA shape susceptibility to stress and fear in rodents (35, 36), we also measured depression‐like and risk‐taking behavior after light‐induced activation of JF‐NP26 in the BLA of neuropathic mice. Light‐induced activation of JF‐NP‐26 decreased risk‐taking hence increased anxiety‐like behavior in CCI mice as shown by the decreased number of entries into, and reduced time spent in, the light compartment of the light‐dark box (Fig. S4a‐c). Depression‐like behavior assessed with the tail‐suspension test was unchanged in CCI mice after light‐induced irradiation of JF‐NP‐26 in the BLA (Fig. S4d).”

      The inclusion of only males is unfortunate because of known, significant sex differences in neuronal circuits driving pain conditions, in both preclinical models (including form work by the authors) and in clinical populations.

      We are aware that there are important sex differences in the pain neuraxis, but this study was not about sex differences. The goal was to evaluate any region‐specific actions of systemically administered compounds (mGlu5 NAMs) and the contribution and requirement of specific brain regions to the observed drug effects, using photopharmacology and drugs activated or inactivated/reactivated by light. This analysis would have been less straightforward in female mice given for example that it is known that mGlu5 receptors interact with estrogen receptors. This aspect could be addressed in a future project. The present study provides the basis for comparative studies in females.

      The elegant slice experiments (especially Fig. 3) were designed to probe circuit mechanisms through which mGluRs act in different brain regions. These experiments also provide a control to assess whether the photopharmacological compounds act as advertised. Surprisingly, the effect size produced by these compounds on neuronal activity are rather small (and, at times, seems driven by outliers). How this small effect affects the interpretation of the behavioral findings is not clear.

      These small effect sizes should also be considered when interpreting the circuit actions studied here.

      We greatly appreciate your insightful comments and constructive feedback on our findings. The mean effect sizes observed in certain experiments are quite small, but effects or changes were very consistent. And we illustrate this now by including lines to connect individual data points for the same neuron in the modified Figure 3 (f, g, n, o) to show consistent changes observed in the EPSC and IPSC graphs. We would like to add that is not quite clear how neuronal effects translate into behavioral consequence, how much of a change in individual neurons or in a population of neurons or change of a certain magnitude is sufficient and required. These are all interesting questions, but the results of our behavioral and electrophysiological data match quite nicely, including differential or opposing drug effects.

      Some of the sample sizes are as small as n=3. Without an a priori power analysis, it is difficult to assess the validity of the analyses.

      The authors present intriguing data on changes in InsP levels in some (but not all) animals after injury, but not in sham animals. They also report an increase in the expression of mGLuRs expression in some, but not all brain regions. These findings are not discussed. It is not clear how these selective changes in mGluR expression and activity might affect the interpretation of the photopharmacological results.

      We performed new experiments to increase sample size in PI experiments in the infralimbic and prelimbic cortices where the n was low. Now the data are more solid. New statistical values are reported in the legend of Fig. 1. We also added a discussion of the signaling data (page 9) as follows:

      “We found that mGlu5 receptor‐mediated PI hydrolysis was significantly amplified in all subregions of the contralateral mPFC and in the contralateral amygdala after induction of neuropathic pain whereas mGlu5 receptor protein levels were significantly increased only in the contralateral infralimbic cortex of neuropathic mice. This suggests that, at least in the anterior cingulate cortex, prelimbic cortex, and basolateral amygdala, mGlu5 receptors become hyperactive after induction of pain. It remains to be determined if this is mediated by an enhanced coupling of mGlu5 receptors to Gq/11 proteins, increased expression of phospholipase‐C or other mechanisms. Interestingly, mGlu5 receptor signaling was down‐regulated in the thalamus of neuropathic mice, but mGlu5 blockade in the thalamus still had antinociceptive effects (see below). Downregulation of mGlu5 receptor signaling in the thalamus might represent a compensatory mechanism aimed at mitigating pain in neuropathic mice.”

      The behavioral data seem to represent discrete, and not continuous variables. The statistical tests applied are likely inappropriate for these analyses.

      The behavioral values reported here represent measurements of force (g) required to elicit a reflex (i.e., reflex thresholds) and can be considered continuous variables. The statistical tests used for the behavioral experiments included either t‐test to determine if the difference between two groups was statistically significant or One‐Way ANOVA (repeated measures when appropriate) to determine if there were any statistically significant differences between the means of three or more groups. This form of analysis for the outcome measures in this study is well‐established in the literature.

      The authors assume (and state in the abstract) that they can selectively stimulate BLA afferents to the neocortex. This is technically highly unlikely.

      We appreciate the reviewer's insightful comment regarding the technical challenges associated with the selective stimulation of BLA afferents to the neocortex. We are aware that the electrical stimulation does not allow the exclusive stimulation of a specific pathway, though BLA afferents form the major component of afferent fibers running in the layer IV of the infralimbic cortex on their way to targets in layer II/III and layer V or infra‐ and pre‐limbic cortices.

      Our previous work (Kiritoshi et al., 2016) compared directly electrical and optogenetic stimulation in the mPFC, and found that they match, suggesting that electrical stimulation provides a reliable means to activate BLA input in the mPFC. We acknowledge the technical limitations of selective BLA activation with electrical stimulation, though we are confident that our approach allowed the investigation of mGlu5 manipulations in the BLA‐mPFC circuitry. We have modified the abstract to read as follows: “Electrophysiological analysis showed that alloswitch‐1 increased excitatory synaptic responses in prelimbic pyramidal neurons evoked by stimulation of presumed BLA input, and decreased BLA‐driven feedforward inhibition of amygdala output neurons”.

      The results from the experiment on rostroventral medulla (RVM) neurons are less than convincing because only a "trend" towards decreased excitation is reported. As above, without consideration of effect size, it is hard to appreciate the significance of these findings. The absence of a demonstration of a classical ON Cell firing pattern is also unfortunate.

      We appreciate this observation. Based on the Reviewer’s suggestion, we report below the effect size of optical modulation in the prelimbic cortex on RVM activity, according to Cohen’s d calculation from ttests (now shown in the Table 1). This information is also included in Results (page 6).

      Moreover, in this study we classified ON‐ or OFF‐cells based on their firing patterns relative to nocifensive withdrawal responses (H.L. Fields and M.M. Heinricher 1985). As ON‐cells with high basal firing can be easily misclassified as NEUTRAL‐cells (N.M. Barbaro, M.M. Heinricher, H.L. Fields, 1986), potential NEUTRAL‐cells with continuous spontaneous activity were verified by giving a brief bolus of anesthetic to the point that the withdrawal reflex was abolished. Indeed, firing of spontaneously active ON‐cells slows or stops with this manipulation, which unmasks reflex‐related responses. This is now reported and explained in Methods (page 14).

    1. Author response:

      Reviewer #2 (Public Review):

      (1) The groups of patients with endometrial cancer in the manuscript are classified according to age greater than/less than 60. Please explain why 60 years old is chosen as the boundary value of age.

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

      (2) Among the patients with endometrial cancer selected in the manuscript, AFP outliers accounted for a relatively small proportion. The authors chose the clinical detection outliers of CA-125, CA19-9, AFP and CEA as the dividing line, instead of re-selecting the optimal cut-off value in thispopulation, which should be classified and the prognostic value explored.

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

      (3) In cancer research, stage is an important prognostic factor to guide the treatment of patients in clinical work. Patients with different stages of endometrial cancer have obvious prognostic differences. The authors constructed a new prognostic risk score based on serum level of AFP, CEA andCA125, the prognostic value of the risk score should be validated in patients with endometrial cancer at different stages。

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors tested the hypothesis that protein consumption decreases with decreasing mass-specific growth during development. This hypothesis is firmly grounded in the logical premise that as animals progress from periods of reduced activity and rapid growth to phases of increased activity and reduced mass-specific growth during their development, they are likely to adjust their nutrient intake, reducing protein and increasing carbohydrate consumption accordingly. The authors tested their hypothesis using the South American locust Schistocerca cancellata, combining field observations with laboratory experiments. This approach allowed them to discern how variations in activity history and metabolism between field- and laboratory-raised locusts influenced their nutrient requirements.

      Their findings, indeed reveal the predicted shift from high protein: carbohydrate consumption to lower protein: carbohydrate intake from the first instar to adult locust - a decline that strongly correlated with a decrease in mass-specific growth rate. Their comparison between field- and laboratory-raised locusts, showed that protein demand was not different, however, carbohydrate consumption rate was >50% higher in the field locusts. These results add depth and significance to the study, shedding light on how environmental factors influence nutrient requirements. What truly amplifies the strength and novelty of the authors' hypothesis is their anticipation that this observed trend in Schistocerca cancellata could extend to all animals. This anticipation is rooted in the expectation that growth rates scale hypometrically across various body sizes and developmental stages, introducing a universal dimension to their findings that holds great promise for broader ecological and evolutionary understanding.

      However, while the study is commendable in its methodology and core findings, there is room for improvement in clarifying the implications of the results. The current lack of clarity is evident in the somewhat shallow questions outlined in lines 358 to 363. For instance, the practice of administering age-specific diets has been commonplace in human and livestock management for ages. Thus, its continued utility may not be the most stimulating question. Instead, a more thought-provoking inquiry might delve into whether variations in global protein availability play a pivotal role in driving niche specialization and the biogeography of animal body sizes and ontogeny, especially considering the potential impacts of climate change. Such inquiries would further elevate the significance of the author's work and its broader implications in the field.

      Thanks for the suggestions. We have added additional sentences to the discussion regarding how size affects protein:carbohydrate consumption may affect physiology and ecology of animals.

      Reviewer #2 (Public Review):

      How and why nutritional requirements and intake targets change over development and differ between species are significant questions with wide-ranging implications spanning ecology to health. In this manuscript, Talal et al. set out to address these questions in laboratory and field experiments with grasshoppers and in a comparative analysis of different species.

      The authors conclude that the target intake of protein to non-protein energy (in this case carbohydrate) (P:C) falls over developmental stages and that this occurs because of a decline in mass-specific intake of protein whereas mass-specific carbohydrate intake remains more constant. The decrease in mass-specific protein consumption rate is tightly correlated with a decline in specific growth rate. Hence, protein consumption directly reflects requirements for growth, with hypometric scaling of protein intake serving as a useful relationship in nutritional ecology.

      The laboratory experiments on the locust, Schistocerca cancellata, provide an elegant dataset in which different instars have been provided with one of two nutritionally complementary food pairings differing in protein to carbohydrate (P: C) content, and their self-selected protein to carbohydrate "intake target" measured.

      These lab locust results were then compared with independently collected field data for late instar nymphs of the same locust species, and the conclusion is drawn that field insects ingested similar protein but 50-90% more carbohydrate (with only 23% increased mass-specific resting oxygen consumption rates). Numerous uncontrolled variables between the lab and field studies make meaningful conclusions difficult to draw from this observation.

      Thank you for this comment. We have revised the text to better explain that very few studies have directly compared lab and field intake target data, and that our goal was to test whether lab intake targets predicted those for field-collected animals. We have also revised the discussion to describe the many possible reasons that intake targets for field-collected animals may diverge from those of lab-reared locust.

      A graph is then provided showing comparative data across a selection of species, making the case that protein consumption scales similarly both developmentally and across taxa. Questions need to be addressed for this to be convincing, including which criteria were used to select the examples in the graph and how comprehensively do these represent the available literature.

      We now provide further data in the methods on our literature search methods.

      Reviewer #3 (Public Review):

      The main goal of this study was to test how and why the intake of two important macronutrients ‒protein and carbon‒ often changes with ontogeny and body size. To do this, authors examined protein and carbon intake in a locusts lab population, across each instar and adult stages. Then, authors examined how the optimal balance of carbon and protein intake in a wild locusts population corresponded to that observed in the laboratory population. Results of these experiments showed that with ontogenic growth, locust decreased protein while increasing carbohydrate intake. Authors concluded that such decrease in the protein: carbohydrate intake may result from reductions in specific growth rates (growth within each instar). The protein: carbohydrate intake in the lab population appeared to be consistent with that observed in a wild locust population. Finally, authors combined their data with that from the literature to examine how protein intake scales with body mass throughout development, within and across different species.

      Strengths:

      To determine how locusts balance protein: carbohydrate intake, authors applied the Geometric Framework (GF) of nutrition, which is a powerful approach for studying effects of nutrition and understanding the rules of compromise associated with balancing dietary unbalances.

      Captivity can change behavior and physiology of most organisms, making it difficult to establish the relevance of laboratory experiments to what happens in the real world. A strength of this paper is that it compares behavior/physiology of lab vs. wild locusts. Finally, this study takes a step further by proposing a new scaling rule based on this study's results and data from the literature on various species.

      Weaknesses:

      Although the paper has strengths, there seems to be several methodological issues that obscure the interpretation/conclusions presented in the manuscript.

      It appears that authors are not actually estimating "Intake Targets", as stated throughout the manuscript. According to the geometric framework, the intake target (IT) is estimated as the point in the nutritional landscape under which performance/fitness is optimized. The geometric framework also predicts that animals can reach their intake targets by feeding selectivity when given a choice of diets that differ in nutrient amounts, which is what authors did here. However, because the relationship between fitness/performance with diet was not established, in the choice experiments authors seem to be assuming (but not testing) that locusts are reaching their intake target.

      The reviewer is correct that we have not tested whether the intake target selected by each instar maximizes growth or some other measure of fitness. This is a nontrivial task, as there are many possible indices of fitness for juvenile instars, including growth rate, developmental time, resistance to disease/stress, as well as effects on adult reproduction. We use intake target as defined by Raubenheimer and Simpson (2018), “the intake target (IT) is a geometric representation of the nutrient mixture that the regulatory systems target through foraging and feeding.” As we explain above, we followed the protocols used by most investigators to measure intake targets, including for many papers locusts.

      You estimated a mass-specific protein intake for each instar. It is not clear why mass-specific intake and not just intake of protein was used for analysis. While mass (or size) of an individual may influence food consumption, it seems like authors calculated mass-specific consumption using each instar's final mass, which would make mass a result of protein consumption (and not the opposite). Importantly, the comparison between mass-specific protein consumption and specific growth rate may be problematic, as both variables seem to be estimated using final mass.

      Thank you for this important comment. We agree and therefore, we changed figure 2 and the related analyses, using protein consumption rate corrected for initial rather than final mass.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that the animals begin with a large proportion of random choices (choices irrespective of the goal location), which over days of experience becomes a combination of spatial choices (choices targeted around the goal location) and serial choices (successive stepwise choices in a given direction). Moreover, the authors show that after the animal has many days of experience in the maze, they still often began each trial with a random choice, followed by spatial or serial choices.

      This study is written concisely and the results are presented concisely. The best fit model provides valuable insight into how the animals solve this task, and therefore offers a quantitative foundation upon which tests of neural mechanisms of the components of the behavioral strategy can be performed. These tests will also benefit from the automated nature of the task.

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. A major strength is the novel and clever experimental design which rotates the floor and intramaze cues before the start of each new trial, allowing the previous goal location to become the next starting position. The modelling sampling a Markov chain of navigation strategies is elegant, appropriate and solid, appearing to capture the behavioural data well. This work provides a valuable contribution and I'm excited to see further developments, such as neural correlates of the different strategies and switches between them.

      Reviewer #3 (Public Review):

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Comments on revised version:

      The authors have addressed all the points I outlined in the previous round of review, resulting in significant improvements to the manuscript. However, I have one remaining comment. Given the updated inter-animal analysis (Supplementary Figure 8), it appears that male and female mice develop strategies differently across days. Male mice seem to progressively increase their employment of spatial strategy across days, at the expense of the random strategy. Conversely, female mice exhibit both spatial and serial strategies at their highest levels on day 2, with minimal changes observed on the subsequent days.

      These findings could alter the interpretation of Figure 5 and the corresponding text in the section "Evolution of search strategy across days".

      For instance, this statement on page 6 doesn't hold for female mice: "The spatial strategy was increased across days, ... largely at the expense of the random strategy."

      We agree with the reviewer. While the text on page 6 is still valid for the male-female pooled data, we have clarified in the next section describing male-female differences that this trend is not observed in female. Furthermore, we adjusted the relevant part of the discussion the following manner:

      “A shift in the proportion of random, spatial and serial strategies was observed across days. Several factors might contribute to this shift, including learning of the environment and goal location, changes in motivation for exploration versus goal-directed navigation, and the evaluation of each strategy’s benefit via reinforcement learning. The spatial strategy progressively increased, mostly at the expense of the random strategy. This trend suggests a diminishing interest in exploration and an increasing benefit from employing the spatial strategy as the mice became more familiar with the environment and goal location. Consistent with this hypothesis, the development of the spatial strategy approximately matched the development of spatial maps in the hippocampus37 and the growth pattern of hippocampal feedforward inhibitory connectivity62, both showing progressive increases that reached plateaus after a week. In contrast, the serial strategy showed a sudden increase from day 1 to day 2, indicating that this goal-directed strategy is associated with rapid learning and could already be reinforced on day 2. However, the strategy shift was not uniform across the mouse population, as male and female mice showed distinct trends. Female mice showed no progressive increase in spatial strategy and initially relied more on the spatial strategy while using the random strategy less compared to male mice. This difference might be explained by faster learning of goal location and/or a stronger inclination towards goal-directed navigation over exploration in female mice.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) The following sentence in the abstract is not grammatical: "The processes randomly selected vestibules based on either uniform (random) or biased (serial and spatial) probability distributions; closely matched experimental data across a range of statistical distributions characterizing the length, distribution, step size, direction, and stereotypy of vestibule sequences; and revealed a shift from random to spatial and serial strategies over time, with a strategy switch occurring approximately every 6 vestibule visits."

      One possible revision is: "The processes randomly selected vestibules based on either uniform (random) or biased (serial and spatial) probability distributions; [they] closely matched experimental data across a range of statistical distributions characterizing the length, distribution, step size, direction, and stereotypy of vestibule sequences, [revealing] a shift from random to spatial and serial strategies over time, with a strategy switch occurring approximately every 6 vestibule visits."

      We followed the reviewer’s suggestion.

      (2) There is a missing word in the following sentence in the last paragraph of the discussion: "Our tools might be combined in the future with optogenetic and/or pharmacogenetic [missing word here] to investigate the neural mechanisms underlying strategy selection"

      We added the word ‘manipulations’: ‘… optogenetic, pharmacogenetic manipulations …’

      Reviewer #2 (Recommendations For The Authors):

      I have two minor suggestions:

      (1) Results - Automated Maze section: It would be beneficial to clarify here that the floor and cues rotate allowing automation by chining start/end positions together. This information is key to the reader understanding the task and currently they would only know this by studying fig1 or delving into the methods

      As suggested by the reviewer, we have added the following text in the Results - Automated Maze section:

      “The maze consist of an enclosed arena with an array of 24 doors evenly spaced along the periphery, and two home boxes moving around the arena perimeter. Start positions are changed by rotating the arena and the home boxes (Fig. 1b). Furthermore, the arena has a tinted cover that prevents mice from seeing room cues while still allowing for infrared tracking of mouse trajectories.”

      (2) I still find the author's decision to exclude days from some of the line plots, e.g. days 3,4,5 from Fig2 etc, a little odd as this makes the reader wary. I appreciate their argument about clarity, but this can still be achieved while partitioning all of the data rather than excluding certain days. NB I do not find the heat map distributions in the far panel a particularly good substitute for this as pixel intensities are far less interpretable

      We appreciate the reviewer’s comment. We want to point out that line plots for all individual days are actually displayed in Supplementary Figure 7a.

      Reviewer #3 (Recommendations For The Authors):

      Although the difference between females and males is clear in Figure S8b, please note that the statistics in panels C and D might not be appropriate, as many of them may become insignificant if adjusted for multiple comparisons.

      If we understand correctly, a Bonferroni correction would need to consider the 3 day intervals in Figure S8c and the 2 day groups in Figure S8d. This would mean a significance threshold of 0.05/3 = 0.016667 for Figure S8c and 0.05/2 = 0.025 for Figure S8d, after Bonferroni correction. As it stands, all comparisons that are not labelled ’ns’ in Figure S8c-d remain significant even after applying the Bonferroni correction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) The authors should show i) whether the variants exhibit the same surface expression as wildtype and ii) whether changes of surface expression (e.g. wt transporter expressed low and high) alters growth rates under conditions where growth depends on amino acid uptake. The authors say that the uptake of radioactive substrate and the overall fitness coincide (Figures 5 and 6), but it would be good to quantify the correlation, perhaps by using a scatterplot and linear regression.

      We thank the reviewer for the questions and proposals. The comparison of the surface expression between the transporter-expressing variants was added to the manuscript (Figure 3- Figure supplement 1 and 2). In the case of the AGP1 variants it was calculated that surface expression between the evolved mutants and the wild-type is similar, indicating that the transporter overexpression has no impact on the growth rate per se. The same analysis for the PUT4 variants showed significant difference, with the PUT4-S variant seemingly expressed more than the wild-type. However, that does not seem to affect the uptake effect of the mutation in the cases of the original substrates of Ala, Gly and GABA, since in those cases the transporter activity for the evolved variant is substantially decreased (Figure 5). Thus, the variation on the surface expression between the mutant and the wild-type, which could be attributed to the small sample size and the inherent limitations of the analysis (imaging of a culture with cells in different planes), is not expected to interfere with the reported results.

      Additionally, a scatterplot accompanied with a linear regression curve describing the connection between the overall fitness and uptake of 2 mM radioactive substrates was added to the manuscript, as advised (Figure 5- Figure supplement 2). In both cases of 2 mM Phe or Glu, the regression model explains 60-70% of the variation observed in the uptake rate of the amino acids by the different variants if changes in the uptake rate are dependent on changes in the fitness.

      (2) The authors should further investigate to what extent the (over)expression of wildtype versus variant transporters impacts growth rates. I would recommend such experiments being done under conditions where nitrogen uptake does not depend on amino acid uptake. I could imagine that some of the fitness data are confounded by the general effects of mutations on growth rates. More concretely, I could imagine that overexpression of e.g. the AGP1-G variant is less of a burden for the yeast cells and would allow to grow them better in general. This could explain why its overall fitness is close to wt, whereas other variants exhibit diminished fitness (Fig. 4A).

      The growth curves of all transporter variant cultures in the absence of selection for amino acid uptake have been presented in Figure 4 - Supplement figure 1. As proposed, the growth rates of the variants in medium with ammonium as nitrogen source were calculated and presented in Figure 3- Supplement figure 1 and 2. For both cases of AGP1 and PUT4 expressing variants, statistical analysis showed no significant difference between the mutants and the wild-type.

      (3) It is quite remarkable that the PUT4-S variant has such a dramatically enlarged substrate spectrum. In addition, the fitness losses for Alanine and GABA are rather small. This striking finding asks the question of why yeast has not evolved this much better/more efficient variant in the first place?

      We thank the reviewer for this very good question. We now included an explanation in the Discussion, but to give a short answer here: One should keep in mind that we used a 10-gene deletion strain to select for given mutants. Wild-type cells have a wide spectrum of substrates through the use of many amino acid transporters, and their regulation is intricately tuned to achieve optimum transport under any environmental circumstance. Broadening the spectrum of a single transporter thus would not lead to increased fitness. On the contrary, it would probably throw off this fine balance.

      (4) It would be generally interesting which types of selections (transporter/amino acid combinations) were tried (maybe as part of the methods section). I could imagine that the examples that are shown in the paper are the "tip of the iceberg", and that many other trials may have failed either because the cultures died, or the identified clones would grow faster due to mutations outside of the plasmid. It would be helpful for researchers planning such experiments in the future to be made aware of potential stepping stones.

      The issues raised here are spot-on, as we actually did test the evolution of PUT4 towards transport of other amino acids than the two mentioned in the report. Aside from the successful Asp and Glu, we ran parallel cultures selecting for transport of Gln, Thr, Trp, Tyr, and Cit. Neither of these evolution regimes led to increased growth phenotypes that were linked to the evolved gene, and we did not investigate these cultures further. At this point, we cannot fully explain this result, which is why we decided to omit it from the report. The L207S variant of PUT4 was later shown to indeed support growth on Gln, Thr, and Cit. Therefore, we speculate that the reason for not evolving this mutant in the respective evolution cultures was that the fitness gain in these amino acids was not large enough to be sufficiently enriched in the course of the evolution trial. Given that the Δ10AA strain still harbors nine amino acid transporter genes in its genome, it is conceivable that upregulation of some of these genes causes growth in some amino acids, prohibiting the selection of mutations in PUT4 (e.g., by mutations outside the plasmid, as the reviewer aptly suggested). We deemed these (negative) results not appropriate for the manuscript, as our main focus was characterizing the fitness effects of single mutations, not the laboratory evolution process of obtaining the mutants.

      (5) The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. In such approaches, it is difficult to know which mutation space is finally covered/tested, and information that can be gained from loss-of-function analyses is missed. Accordingly, the outcome is somewhat anecdotal. To provide an idea of the mutational landscape accessible, the authors could perform NGS of cultures without any selective pressure, and report the distribution of missense variants in the population.

      We very much appreciate the interest in the details of the mutagenesis. Based on the information given in the original OrthoRep publications (e.g., Ravikumar et al., DOI: 10.1016/j.cell.2018.10.021; mutation rate approx. 10-5 per generation and nucleotide), we calculated the expected number of mutations per passage in our experiments. For AGP1, it is about 5000 mutational events per passage (10 mL culture volume and 1:200 dilution), and for PUT4, it is about 1000 mutational events per passage (2 mL culture volume and 1:100 dilution). At a gene length of about 2000 bp, we expect to cover most single mutations already in the first or second passage (in the absence of selection). This is reflected in the result that the strongly beneficial mutation L207S in PUT4 was recovered in every selection on Asp or Glu we tested. We included this information in the Methods section.

      That said, the present study was consciously designed to research gain-of-function mutations, as we wanted to know if and how membrane transporters can evolve new substrate specificities without losing the original functions. Our approach was chosen to reflect as close as possible a natural scenario where a microorganism encounters a new ecological niche (a new nutrient to be transported). At the same time, we included selective pressure to keep the capacity to thrive in the original niche (to assimilate an ancestral nutrient). This approach is designed to specifically select against any loss-of-function mutations, which is in line with most modern theories about evolution of protein function (excellently reviewed in Soskine and Tawfik, DOI: 10.1038/nrg2808). We find that this approach gives a good idea how transporters could evolve new functions in a natural setting. By engineering single mutations in the wild-type background of the transporters, we show the fitness effects of different single mutations - this finding thus does not depend on the mutational landscape that is covered in the experiment.

      (6) The authors do not discuss the impact of these mutations on transport rates/kinetics, which are known to play a role in substrate selection in solute carriers (https://www.nature.com/articles/s41467-023-39711-y). Do the authors think ligand binding/recognition is more important than kinetic selection in the evolution of function?

      Indeed, the observed phenotypes can stem from both changes in transport rate and changes in substrate binding. In our opinion, both are perfectly possible explanations for the behavior of evolved transporter variants. We are not discussing this in the manuscript as the weak transport of the novel substrates in the wild-type transporters did not allow us to unambiguously assign one or the other. Yet, we can lend minor circumstantial evidence pointing towards substrate affinity being the more important factor in evolving a new activity in transporters: Overall transport rate (for original substrates) declined in most evolved transporters. Therefore, it is a bit less likely that improved transport rate allowed novel substrates to be used as a nutrient. However, this is not to say that both processes can occur (even side by side).

      (7) Ultimately, what are the selective pressures that drive transporter function? The authors pose this question but don't fully develop the idea. Would promiscuous variants still be selected for if the limiting nitrogen source was taken up by the cell via a different pathway (i.e. ammonium or perhaps arginine)?

      Evolution and regulation of transporters is a very complex system, and we simplify this system in our single-transporter/single-amino acid approach. In nature, the selective forces are assumed to be much smaller than in our system, and multiple selective pressures might occur at the same time (maybe even in opposite directions). Therefore, such predictions are beyond the scope of the present study. To put it shortly, yeasts (and other organisms) have evolved the capacity to transport all natural amino acids. Yet, to actually allow fine-tuned regulation of transport of each individual amino acid, narrow- and broad-range transporters have evolved, including a lot of redundancy. This means that the question posed cannot be answered by yes or no, but by “it depends”.

      (8) Amino acids are a special class of metabolites, in that they all have the same basic structure. Thus, transport systems really only need to recognize the amino and carboxyl groups with high fidelity, and can modulate the side chain binding site to increase specificity. This was demonstrated in a bacterial APC transporter (https://www.nature.com/articles/s41467-018-03066-6#Sec2). Is this why the APC fold is largely responsible for AA uptake in biology?

      Indeed, typically, APC-type amino acid transporters bind the amino and carboxyl groups in the same position by backbone interactions. Therefore, this might be an ancestral feature of the APC superfamily and explain why this group represents the main group of amino acid transporters.

      (9) There isn't much discussion on the location of the mutations with respect to binding site vs. gating helices. Are there hotspots of mutations within the APC, and areas where variation is poorly tolerated? It would be helpful to briefly review what is known about mutations that change amino acid specificity in the APC family. My impression is that other studies applying rational mutagenesis have also shown that single-site mutations in the binding pocket alter substrate specificity - are these analogous to the L207 in PUT4? PUT4: I64T comes up in 3 of 5 selections. Did the authors consider a closer analysis of this mutation, and if not, why?

      We agree that it would be helpful to determine hotspots of mutations in APC transporters that lead to changes in selectivity. However, we feel that the current literature does not lend enough data to support an extended analysis of such hotspots. Conversely, the natural sequences of APC transporters are not similar enough to determine which residues are responsible for a certain selectivity profile. There are however some studies on site-directed mutagenesis, as mentioned by the reviewer. A short summary of those is discussed in the revised paper. Interpretation of the previous studies under the light of our results suggests that the evolutionary evolved sites derived in our work play a significant role in substrate selectivity and transporter function within the superfamily of the APC transporters.

      As to the question why we did not include the I64T mutation in our experiments: this mutation lies within the poorly defined N-terminus of the protein, which is not part of the transmembrane core. We therefore deemed this residue as probably not connected to the specificity of the protein; it might be related to the protein’s stability in the cell, as the termini of transporters are known to be important for post-translational regulation, especially vacuolar degradation.

      (10) What do we learn about the APC fold that informs our understanding of where substrate specificity arises in this fold? Do the authors think all SLC folds are equally capable of adaption, or are some more evolutionary-ready than others? An evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions would further strengthen the manuscript. Could the authors provide a sense of how similar the 18 yeast amino acid transporters are, such as sequence alignments or a matrix of pairwise sequence identity/similarity? Are they very diverged, or is the complement of amino acid substrates covered by a rather conserved suite of transporters?

      We do not want to make bold statements about adaptive evolution in other SLC folds, but we consider it not unlikely that a similar approach will lead to similar conclusions in other transporters.<br /> As advised, a pairwise identity matrix was added to the manuscript (Figure 1–figure supplement 2).

      As to the proposed analysis focusing on natural occurrence of the mutations we found: we have indeed looked into this, but have not found evidence of such mutations. This is actually expected, as our selection regime puts “unnatural” selective pressures on a single transporter in isolation, which in reality co-evolved with a whole suite of other transporters that already have the capacity to transport all amino acids. Therefore, it is unlikely that the same mutations would happen in a natural setting. Our study is designed to capture evolution where a completely novel substrate is encountered, for which no transport mechanism has evolved yet.

      (11) Throughout: some of the bar graphs show individual data points, but others do not (Figure 3, Figure 5). These should be shown for all experiments.

      We thank the reviewer for the comment. In the revised version of the manuscript, we included individual data points in all bar graphs.

      (12) For bar graphs in which no indication of significance is shown, does this mean that p>0.05? Comparisons that are not significant (p>0.05) should be indicated as such.

      We thank the reviewer for the comment. In the revised version of the manuscript, we indicated in the legends that in cases of no significant difference (p > 0.05) between the wild-type and the evolved variants, no asterisks are shown.

      (13) Figure 5, Figure 6: Are the three confocal images just three different fields of view? It might be useful to include a zoom-in on a single representative cell, as it is hard for the reader to see to evaluate the membrane localization.

      In the revised version of the manuscript, we clarified that the three confocal images represent three different cultures, as each variant was tested in triplicates. We also included a zoom-in of a representative cell, as suggested.

      (14) In the main text, page 9, the conditions used for each experimental evolution are not clear ("nitrogen limiting mixture of amino acids (1 mM final concentration)". I think this is an important detail, since the mixtures are quite different for the more promiscuous vs. the more selective transporter, and it would be helpful if this was described more clearly in the main text.

      We thank the reviewer for the comment. We have included further clarification in the revised manuscript.

      (15) Figure 1-Supplement 1 and Figure 4 Supplement 4 - can't read the figure labels. Try labeling columns and rows rather than individual plots.

      We have taken the proposal into account and revised the proposed Figures accordingly.

      (16) Page 9: "The transporter gene was sequenced and re-introduced into Delta-10AA cells." Was the plasmid isolated, sequenced, and re-introduced, or was the gene cut-and-pasted into a new vector backbone?

      In the revised manuscript we have clarified that the gene was sequenced and then cloned into the expression vector and re-introduced into naïve Δ10AA cells.

    1. Author response:

      We thank the reviewers for appreciating our study and for providing valuable comments and recommendations.

      We are convinced that by carefully addressing the reviewers' comments and questions, we will be able to improve the manuscript’s quality.  

      Specifically, we aim to provide further analysis to validate the subdivision of G32 RGCs into sub-clusters.

      In that context, we will improve the alignment of the RGC sub-types between the calcium imaging and MEA datasets.  

      To give the reader all information about our analysis, we will improve the methods section and explain the normalization of the calcium traces and the clustering in more detail.

      Furthermore, we will also address the concerns regarding the design of the calcium imaging experiments, potential false-negative effects, and why we did not include a wash-out condition in our experimental protocol.  

      Finally, we will revise the discussion about potential NO mechanisms and expand it on how the effects we observed may relate to known or potentially novel mechanisms.

      In particular, we will also deepen our discussion and interpretation of the strychnine dataset.  

      Again, we would like to thank the reviewers for their valuable comments.

    1. Author response:

      Reviewer #1 (Public Review):

      This is an important and very well conducted study providing novel evidence on the role of zinc homeostasis for the control of infection with the intracellular bacterium S. typhimurium also disentangling the underlying mechanisms and providing clear evidence on the importance of spatio-temporal distribution of (free) zinc within the cell.

      We thank the reviewer for the positive comments.

      1) It would be important to provide more information on the genotype of mice.

      As suggested by the reviewer, we have added the detailed genotype of Slc30a1flagEGFP/+ and Slc30a1fl/flLysMCre mice to the revised supplementary Figure supplement 10.

      2) It is rather unlikely that C57Bl6 mice survive up to two weeks after i.p. injection of 1x10E5 bacteria.

      According to the reviewer comment, we have tested survival rate using a group of our experimental animals and C57BL/6 wild type.

      The Salmonella stain is a gift from our friend, Professor Ge Bao-xue. We have sent this stain for genetic characterisation which we found 100% identity to Salmonella enterica Typhimurium with many strains originated from poultry. One of them is Salmonella enterica subsp. enterica serovar Typhimurium strain MeganVac1 (Accession: CP112994.1), a live attenuated stain. We hope that this would support the relationship between the high infectious dose and mice survive.

      Author response image 1.

      (A) Survival rate of Slc30a1fl/fl and Slc30a1fl/flLysMCre (n = 14-15/group) and (B) Survival rate of C57BL/6 wild type (n = 8) after Salmonella infection for two weeks. (C) A fulllength sequence (1,478 bases) of 16S rDNA genes sequences of Salmonella stain and (D) the sequencing electropherogram.

      3) To be sure that macrophages Slc30A1 fl/fl LysMcre mice really have an impaired clearance of bacteria it would be important to rule out an effect of Slc30A1 deletion of bacterial phagocytosis and containment (f.e. evaluation of bacterial numbers after 30 min of infection).

      As the reviewer advised, we have repeated the experiment and measured the bacterial numbers after 30 min of infection (dashed line in A). The results show that there is no statistical difference in the bacterial numbers after 30 min between Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs. Therefore, the reduction of bacterial numbers after 24 hours occurs due to the impairment of intracellular pathogen-killing capacity as the reviewer pointed out.

      Author respnse image 2.

      (A) Time course of the intracellular pathogen-killing capacity of Salmonellainfected Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs measured in colony-forming units per ml (n = 5). (B) Fold change in Salmonella survival (CFU/mL) at different time points from A. (C) Representative images of Salmonella colonies on solid agar medium at 24 hours. Data are represented as mean ± SEM. P values were determined using 2-tailed unpaired Student’s t-test. P<0.05, *P<0.01, and ns, not significant.

      4) Does the addition of zinc to macrophages negatively affect iNOS transcription as previously observed for the divalent metal iron and is a similar mechanism also employed (CEBPß/NF-IL6 modulation) (Dlaska M et al. J Immunol 1999)?

      The reviewer has raised an important point here since free zinc also play a role in multiple levels of cellular signaling components (Kembe et al., 2015). Dlaska and colleague reported that NF-IL6, a protein responsible for iNOS transcription is negatively regulated by iron perturbation under IFNg/LPS stimulation in macrophages (Dlaska and Weiss, 1999). As the reviewer suggested, our results showed that zinc supplementation decreases the iNOS expression in macrophages after Salmonella infection, suggesting that free zinc might play a role in iNOS regulation.

      However, in Slc30a1fl/flLysMCre macrophages, despite increase intracellular free zinc, lacking Slc30a1 also induces Mt1, a zinc reservoir which might negatively affect NO production (Schwarz et al., 1995) or alternatively inhibits iNOS through NF-kB pathway (Cong et al., 2016) as reported by previous studies. Therefore, we couldn’t rule out the possibility that defects in Salmonella clearance due to iNOS/NO inhibition may be caused by a complex combination of excess free zinc and overexpression of the zinc reservoir. To prove this hypothesis, further studies using the specific target, for example Mtfl/fliNOSfl/flLysMCre model might be needed to investigate the precision mechanism.

      Author response image 3.

      RT-qPCR analysis of mRNA encoding Nos2 in BMDMs after infected with Salmonella and Salmonella plus ZnSO4 (20 μM) for 4 h.

      Reference:

      Dlaska M, Weiss G. 1999. Central role of transcription factor NF-IL6 for cytokine and ironmediated regulation of murine inducible nitric oxide synthase expression. The Journal of Immunology. 162:6171-6177, PMID: 10229861

      Kambe T, Tsuji T, Hashimoto A, Itsumura N. 2015. The physiological, biochemical, and molecular roles of zinc transporters in zinc homeostasis and metabolism. Physiological Reviews. 95:749-784. https://doi: 10.1152/physrev.00035.2014, PMID: 26084690

      Schwarz MA, Lazo JS, Yalowich JC, Allen WP, Whitmore M, Bergonia HA, Tzeng E, Billiar TR, Robbins PD, Lancaster JR Jr, et al. 1995. Metallothionein protects against the cytotoxic and DNA-damaging effects of nitric oxide. Proceedings of the National Academy of Sciences of the United States of America. 92: 4452-4456. https://doi: 10.1073/pnas.92.10.4452, PMID: 7538671

      Cong W, Niu C, Lv L, Ni M, Ruan D, Chi L, Wang Y, Yu Q, Zhan K, Xuan Y, Wang Y, Tan Y, Wei T, Cai L, Jin L. 2016. Metallothionein prevents age-associated cardiomyopathy via inhibiting NF-κB pathway activation and associated nitrative damage to 2-OGD. Antioxidants & Redox Signaling. 25: 936-952. https://doi: 10.1089/ars.2016.6648, PMID: 27477335

      5) How does Zinc or TPEN supplementation to bacteria in LB medium affect the log growth of Salmonella?

      We found that zinc supplementation at both low (20 µM) and high (640 µM) concentrations negatively effects Salmonella growth, especially during log phase and stationary phase in the broth culture medium, but not TPEN (20 µM) supplementation. These indicates that high zinc conditions occur at cellular levels such as within phagosomes (Botella et al., 2011) can limit bacterial growth.

      Author response image 4.

      Growth curve (optical density, OD 600 nm) of Salmonella in LB medium at different concentrations of ZnSO4 and/or TPEN. Bar graph indicating Salmonella growth at specific time points. Each value was expressed as mean of triplicates for each testing and data were determined using 2-tailed unpaired Student’s t-test. P<0.05, P<0.01, **P<0.001 and ns, not significant.

      Reference:

      Botella H, Peyron P, Levillain F, Poincloux R, Poquet Y, Brandli I, Wang C, Tailleux L, Tilleul S, Charrière GM, Waddell SJ, Foti M, Lugo-Villarino G, Gao Q, Maridonneau-Parini I, Butcher PD, Castagnoli PR, Gicquel B, de Chastellier C, Neyrolles O. 2011. Mycobacterial p(1)-type ATPases mediate resistance to zinc poisoning in human macrophages. Cell Host Microbe. 10:248-59. https://doi: 10.1016/j.chom.2011.08.006, PMID: 21925112

      Reviewer #2 (Public Review):

      This paper explores the importance of zinc metabolism in host defense against the intracellular pathogen Salmonella Typhimurium. Using conditional mice with a deletion of the Slc30a1 zinc exporter, the authors show a critical role for zinc homeostasis in the pathogenesis of Salmonella. Specifically, mice deficient in Slc30a1 gene in LysM+ myeloid cells are hypersusceptible to Salmonella infection, and their macrophages show alter phenotypes in response to Salmonella. The study adds important new information on the role metal homeostasis plays in microbe host interactions. Despite the strengths, the manuscript has some weaknesses. The authors conclude that lack of slc30a1 in macrophages impairs nos2-dependent anti-Salmonella activity. However, this idea is not tested experimentally. In addition, the research presented on Mt1 is preliminary. The text related to Figure 7 could be deleted without affecting the overall impact of the findings.

      We thank the reviewer for his/her positive comments and constructive suggestions.

      Reviewer #3 (Public Review):

      Na-Phatthalung et al observed that transcripts of the zinc transporter Slc30a1 was upregulated in Salmonella-infected murine macrophages and in human primary macrophages therefore they sought to determine if, and how, Slc30a1 could contribute to the control of bacterial pathogens. Using a reporter mouse the authors show that Slc30a1 expression increases in a subset of peritoneal and splenic macrophages of Salmonella-infected animals. Specific deletion of Slc30a1 in LysM+ cells resulted in a significantly higher susceptibility of mice to Salmonella infection which, counter to the authors conclusions, is not explained by the small differences in the bacterial burden observed in vivo and in vitro. Although loss of Slc30a1 resulted in reduced iNOS levels in activated macrophages, the study lacks experiments that mechanistically link loss of NO-mediated bactericidal activity to Salmonella survival in Slc30a1 deficient cells. The additional deletion of Mt1, another zinc binding protein, resulted in even lower nitrite levels of activated macrophages but only modest effects on Salmonella survival. By combining genetic approaches with molecular techniques that measure variables in macrophage activation and the labile zinc pool, Na-Phattalung et al successfully demonstrate that Slc30a1 and metallothionein 1 regulate zinc homeostasis in order to modulate effective immune responses to Salmonella infection. The authors have done a lot of work and the information that Slc30a1 expression in macrophages contributes to control of Salmonella infection in mice is a new finding that will be of interest to the field. Whether the mechanism by which SLC30A1 controls bacterial replication and/or lethality of infection involves nitric oxide production by macrophages remains to be shown.

      We very much appreciate the reviewer’s detailed evaluation and suggestions. The manuscript has been revised thoroughly according to the reviewer’s advice.

    1. Author response:

      Reviewer #2 (Public Review):

      The manuscript by Chan et al reports results of a systematic mutagenesis approach to study the surface expression and APP+ transport mechanism of serotonin transporter. They complement this experimental evidence with large-scale molecular simulations of the transporter in the presence of APP+. The use of deep mutagenesis and large-scale adaptive sampling simulations is impressive and could be very exciting contributions to the field.

      On the whole, the results appear to provide a fascinating insight into the effects of mutations on transport mechanisms, and how those interrelate with the structural fold and biophysical properties of a dynamic protein and its substrate pathways. A weakness of the conclusions based on the molecular simulation is that it relies on comparison with previously-published work involving non-identical simulation systems (i.e. different protonation states).

      As we explain further below, this is because a preprint of previous MD simulations used a different protonation state for Glu508. However, the final published article (Chan, et al., Biophysical Journal. 121, 715–730, 2022) and new simulations we present here are consistent in having Glu508 protonated.

      Conclusions in this work about the origins of the sodium:serotonin 1:1 stoichiometry should also be considered in the context of the fact that there are two sodium ions bound in the structures of SERT, and more work is needed to explain why this ion is not also released/co-transported.

      We do not have any direct evidence as to why Na+ in the Na1 site is not also symported, except to say that in our simulations it remains bound while 5-HT/APP+ is imported. Only Na+ in the Na2 site is displaced into the cytosol, consistent with the known stoichiometry for transport and consistent with works by others. For example, the Na2 site is conserved as a functionally relevant site in distantly related secondary transporters (Cheng & Bahar, Structure. 2015; 23: 2171-2181; Stolzenberg et al., J. Biol. Chem. 2017; 292: 7372-7384; Koldsø et al., PLoS Comput. Biol. 2011; 7: e1002246; Khafizov et al., Proc. Natl. Acad. Sci. U S A. 2012; 109: E3035-E3044); please see further elaboration in the manuscript on lines 450-462. Nonetheless, it could be inferred from our data that Na+ in the Na2 site is the symported ion because it, rather than Na+ in the Na1 site, shares the exit pathway with substrate (interactions with the displaced Na+ ion are replaced by the amine of the substrate as it moves into the exit pathway).

    1. Author response:

      Reviewer #1 (Public Review):

      The authors report a high-quality genome assembly for a member of Xenacoelomorpha, a taxon that is at the center of the last remaining great controversies in animal evolution. The taxon and the species in question have "jumped around" the animal tree of life over the past 25 years, and seemed to have found their place as a sister-group to all remaining bilaterians. This hypothesis posits that the earliest split within Bilateria includes Xenacoelomorpha on the one hand and a clade known as Nephrozoa (Protostomia + Deuterostomia) on the other, and is thus referred to as the Nephrozoa hypothesis. Nephrozoa is supported by phylogenomic evidence, by a number of synapomorphic morphological characters in the Nephrozoa (namely, the presence of nephridia) and lack of some key bilaterian characters in Xenacoelomorpha, and by the presence of unique miRNAs in Nephrozoa.

      The Nephrozoa hypothesis has been challenged several times by the authors' groups who alternatively suggest placing Xenacoelomorpha within Deuterostomia as a sister group to a clade known as Ambulacraria. This hypothesis (the Xenambulacraria hypothesis) is supported by alternative phylogenomic datasets and by the shared presence of a number of unique molecular signatures. In this contribution, the authors aim to strengthen their case by providing full genome data for Xenoturbella bocki.

      The actual sequencing and analysis are technically and methodologically excellent. Some of the analyses were done several years ago using approaches that may now seem obsolete, but there is no reason not to include them. As a detailed report of a newly sequenced genome, the manuscript meets the highest standards.

      The authors emphasize a number of key findings. One is the fact that the genome is not as simple as one might expect from a "basal" taxon, and is on par with other bilaterian genomes and even more complex than the genome of secondarily simplified bilaterians. There is an implicit expectation here that the sister group to all Bilateria would represent the primitive state. This is of course not true, and the authors are aware of this, but it sometimes feels as though they are using this implicit assumption as a straw dog argument to say that since the genome is not as simple as expected, X. bocki must be nested within Bilateria. The authors get around this by acknowledging that their finding is consistent with a "weak version of the Nephrozoa hypothesis", which is essentially the Nephrozoa phylogenetic hypothesis without implicit assumptions of simplicity.

      We were NOT suggesting that Xenacoels are ‘basal’ though others have certainly done so. We were testing, instead, whether their supposed simplicity is reflected in the compostion of the genome.

      Another finding is a refutation of the miRNA data supporting Nephrozoa. This is an important finding although it is somewhat flogging a dead horse, since there is already a fair amount of skepticism about the validity of the miRNA data (now over 20 years old) for higher-level phylogenetics.

      The missing bilaterian microRNAs was one of the early pieces of evidence excluding the Xenacoelomorpha from Nephrozoa. Our new data are an important refutation of this source of evidence and add to the picture that this phylum is not lacking characters of Bilateria as had been suggested (missing micro RNAs Hox genes explicitly interpreted in this way).

      The finding that the authors feel is most important is gene presence-absence data that recovers a topology in which X. bocki is sister to Abulacraria. The problem is that the same tree does not support the monophyly of Xenacoelomorpha. This may be an artifact of fast evolving acoel genomes, as the authors suggest, but it still raises questions about the robustness of the data.

      In sum, the authors' results and analyses leave an open window for the Xenambulacraria hypothesis, but do not refute the Nephrozoa hypothesis. The manuscript is a valuable contribution to the debate but does not go a significant way towards its resolution.

      The manuscript has gone through several rounds of review and revision on a preprint server and is thus fairly clear of typos, inconsistencies and lack of clarity. The authors are honest and open in their interpretation of the results and their strengths.

      We thank the reviewer for their assessment of our manuscript. We have responded to some of the points they make above. As there were no specific points to edit or change raised by reviewer 1, we are replying in detail only to reviewer 2. We like to note that we have modified the text and thus focus of our manuscript in accordance to with what we think reviewer 1 is suggesting in the last two paragraphs of their review.

      Reviewer #2 (Public Review):

      The manuscript describes the genome assembly and analysis of Xenoturbella bocki, a worm that bears many morphological features ascribed to basal bilateria. The authors aim to analyse this genome in an attempt to determine the phylogenetic position of X. bocki as a representative of Xenacoelomorpha and its associated acoelomorphs. In doing so, they want to inform the debate as to whether xenacoelomorph belong among, or is in fact paraphyletic to all bilaterians.

      This paper presents a high-quality assembly of the X. bocki genome. By virtue of the phylogenetic position of this species, this genome has considerable scientific interest. This assembly appears to be highly complete and is a strength of the paper. The further characterisation of the genome is well executed and presented. Solid results from this paper include a comprehensive description of the Hox genes, miRNA and neruopeptide repertoire, as well as a description of the linkage group and how they relate to the ancestral linkage groups.

      Where this paper is weaker is that for the central claims and questions of this paper, i.e,. the question of the phylogenetic position of xenacoelomorph and whether X. bocki is a slowly evolving, but otherwise representative member of this clade, remains insufficiently resolved.

      The authors have achieved the goal of describing the X. bocki genome very well. By contrast, it is unclear, based on the presented evidence, whether xenacoelomorph is truly a monophyletic group. The balance of the evidence seems to suggest that the X. bocki genome belongs within the bilateria group. However, it is unclear as to what is driving the position of the other acoels. Assuming that X. bocki and the other two species in that group are monophyletic, then the evidence will favour the authors' conclusion (but without clearly rejecting the alternatives).

      This paper will likely further animate the debate regarding this basal species, and also questions related to the ancestral characters of bilateria as a whole. In particular the results from the HOX and paraHOX clusters, may provide an interesting counterpoint to the previous results based on the acoels.

      We thank the Reviewer for their extended comments on our manuscript. We would firstly like to point out that our work was not aiming to resolve the phylogenetic position of X. bocki. We discussed this question at length, as it was and is a major and important question in evolutionary biology, however we think that we had phrased any conclusions in this regard very cautiously as we are well aware of limitations in our data to resolve the conundrum.

      In this revision we have further modified our text, specifically in the Introduction and Abstract, to make it clear that we are contributing to the understanding of the evolution and biology of a fascinating organism that cannot easily be cultured in the laboratory.

      In addition, we have supplied more explanation on why Xenacoelomorpha are generally seen as a monophyletic group and which lines of evidence point to this. Again, it should be noted here that colleagues who regard the Nephrozoa hypothesis as true, do not doubt the monophyly of Xenacoelomorpha.

    1. Author response:

      Reviewer #1 (Public Review):

      This manuscript presents an exciting new method for separating insulin secretory granules using insulator-based dielectrophoresis (iDEP) of immunolabeled vesicles. The method has the advantage of being able to separate vesicles by subtle biophysical differences that do not need to be known by the experimenter, and hence could in principle be used to separate any type of organelle in an unbiased way. Any individual organelle ("particle") will have a characteristic ratio of electrokinetic to dielectrophoretic mobilities (EKMr) that will determine where it migrates in the presence of an electric field. Particles with different EKMr will migrate differently and thus can be separated. The present manuscript is primarily a methods paper to show the feasibility of the iDEP technique applied to insulin vesicles. Experiments are performed on cultured cells in low or high glucose, with the conclusion that there are several distinct subpopulations of insulin vesicles in both conditions, but that the distributions in the two conditions are different. As it is already known that glucose induces release of mature insulin vesicles and stimulates new vesicle biosynthesis and maturation, this finding is not necessarily new, but is intended as a proof of principle experiment to show that the technique works. This is a promising new technology based on solid theory that has the possibility to transform the study of insulin vesicle subpopulations, itself an emerging field. The technique development is a major strength of the paper. Also, cellular fractionation and iDEP experiments are performed well, and it is clear that the distribution of vesicle populations is different in the low and high glucose conditions. However, more work is needed to characterize the vesicle populations being separated, leaving open the possibility that the separated populations are not only insulin vesicles, but might consist of other compartments as well. It is also unclear whether the populations might represent immature and mature vesicles, distinct pools of mature vesicles such as the readily releasable pool and the reserve pool, or vesicles of different age. Without a better characterization of these populations, it is not possible to assess how well the iDEP technique is doing what is claimed.

      Major comments:

      1) There is no attempt to relate the separated populations of vesicles to known subpopulations of insulin vesicles such as immature and mature vesicles, or the more recently characterized Syt9 and Syt7 vesicle subpopulations that differ in protein and lipid composition (Kreutzberger et al. 2020). Given that it is unclear exactly what populations of vesicles will be immunolabeled (see point #2 below), it is also possible that some of the "subpopulations" are other compartments being separated in addition to insulin vesicles. It will be important to examine other markers on these separated populations or to perform EM to show that they look like insulin vesicles.

      We thank the reviewer for this comment and have added the following to the discussion:

      “The intensity peaks we observed at specific EKMr values likely correspond to some of the previously described insulin vesicle subpopulations34,54-57. Larger particles are expected to have a smaller EKMr value compared to smaller particles50. Subpopulations containing larger insulin vesicles, such as a mature pool34,54, synaptotagmin IX-positive vesicles57, or docked vesicles near the plasma membrane34 may have lower EKMr values than smaller immature vesicles. Additionally, phosphatidylcholine lipids increase the zeta potential of tristearoylglycerol crystals58. This effect may extend to insulin vesicle subpopulations containing more phosphatidylcholine, such as young insulin vesicles55 which could lead to higher EKMr values. Taken together, these two properties may be used to predict the EKMr values of known insulin vesicle subpopulations. For example, insulin vesicles with EKMr values of 1-2×109 V/m2 (Fig. 4C) may represent a synaptotagmin IX-positive subpopulation due to their larger radii and depletion under glucose stimulation. Additionally, young insulin vesicles may have EKMr values between 5 and 7.5×109 V/m2 (Fig. 4C) due to higher amounts of phosphatidylcholine present in this subpopulation55. In this EKMr range, we observed a higher intensity for glucose-treated cells which may suggest biosynthesis of new vesicles. Immature insulin vesicles are likely to have higher EKMr values due to their smaller size34, such as an EKMr value between 1.5-1.6×1010 V/m2 (Fig. 4C). Here we demonstrated the capabilities of DC-iDEP to separate insulin vesicle subpopulations in an unbiased manner. Future experiments using chemical probes to label subpopulations will be useful to accurately define the EKMr values associated with specific subpopulations.” pages 7-8, lines 176-191

      Furthermore, we have conducted additional experiments using a modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to visualize a more complete population of insulin vesicles. By using this cell line, we have performed confocal microscopy, transmission electron microscopy, and cryo-electron microscopy experiments, demonstrating that the isolated vesicles resemble insulin vesicles and contain GFP-tagged C-peptide (Fig. 1-S3). While we acknowledge that further investigation using a more detailed labeling strategy of known insulin vesicle populations with DC-iDEP would be informative, we believe it is beyond the scope of our initial proof-of-concept experiments.

      The following text was added to the results section to describe our additional microscopy analysis:

      “To verify that the insulin vesicles were intact prior to DC-iDEP, we imaged a modified INS-1E cell line that contains a human insulin and green fluorescent protein-tagged C peptide (hPro-CpepSfGFP).49 This GFP tag allowed for quick visual verification of intact vesicles using fluorescence confocal microscopy. We observed distinct puncta rather than a diffuse GFP signal which indicated that the vesicles were intact and not ruptured. Further analysis of isolated vesicles was done using EM. We observed intact vesicles with the expected size and shape using both transmission electron microscopy (TEM) and cryo-electron microscopy (cryo-EM) (Fig. 1—figure supplement 3).” Page 5, lines 104 – 109.

      2) An antibody to synaptotagmin V is used to immunolabel vesicles, but there has been confusion between synaptotagmins V and IX in the literature and it isn't clear what exactly is being recognized by this antibody (this reviewer actually thinks it is Syt 9). If it is indeed recognizing Syt 9, it might already be labeling a restricted population of insulin vesicles (Kreutzberger et al. 2020). The specificity of this antibody should be clarified. Furthermore, Figure 2 is not convincing at showing that this synaptotagmin antibody specifically labels insulin vesicles nor is there convincing colocalization of this synaptotagmin antibody with insulin vesicles. In the image shown, several cells show very weak or no staining of both insulin and the synaptotagmin. The highlighted cell appears to show insulin mainly in a perinuclear structure (probably the Golgi) rather than in mature vesicles (which should be punctate), and insulin is not particularly well-colocalized with the synaptotagmin. Other cells in the image appear to have even less colocalization of insulin and synaptotagmin, and there is no quantification of colocalization. It seems possible that this antibody is recognizing other compartments in the cell, which would change the interpretation of the populations measured in the iDEP experiments. It would also be good to perform synaptotagmin staining under glucose-stimulating conditions, in case this alters the localization.

      We thank the reviewer for bringing this issue to our attention. The antibody originally used in Figure 2 recognizes the 386 aa isoform of synaptotagmin, which is called Syt 9 in the paper mentioned above (Kreutzberger et al. 2020). We have edited our manuscript to label this antibody as “Synaptotagmin IX” to match the existing literature. This antibody, therefore, likely labels only a subset of insulin vesicles. We believe that populations measured in the iDEP experiments consist solely of insulin vesicles, as supported by Western blot and dynamic light scattering results (Fig. 1—figure supplement 2B-C), as well as EM images (Fig. 1—figure supplement 3). Even with a subset of insulin vesicles, these results show the potential of this method, as iDEP analysis reveals heterogeneity within the population of Syt 9-positive insulin vesicles. We have replaced the original immunofluorescence images in Figure 2 with images that are more representative of INS-1E cells. We recognize that immuno-labeling did not yield perfect co-localization, which was expected. However, these experiments do provide valuable insights into the promise of using DC-iDEP for more in-depth separation analysis. Future work will use a modified INS-1 cell line or mouse model with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to visualize a less restricted set of insulin vesicles, avoiding the limitations associated with antibodies confined to a specific insulin vesicle subpopulation.

      3) The EKMr values of the vesicle populations between the low and high glucose conditions don't seem to precisely match. It is unclear if this just a technical limitation in comparing between experiments or instead suggests that glucose stimulation does not just change the proportion of vesicles in the subpopulations (i.e. the relative fluorescent intensities measured), but rather the nature of the subpopulations (i.e. they have distinct biophysical characteristics). This again gets to the issue of what these vesicle subpopulations represent. If glucose stimulation is simply converting immature to mature vesicles, one might expect it to change the proportion of vesicles, but not the biophysical properties of each subpopulation.

      We thank the reviewer for this question. We agree that glucose likely shifts the proportion of vesicles within a specific EKMr value rather than impacting the overall biophysical characteristics of all vesicles. We have performed new statistical analysis as suggested and rewritten this section to better explain the differences between conditions.

      “Visual inspection of the collected data revealed generally similar patterns of vesicles collected at specific EKMr values (Fig. 4). However, at 1200 V we achieved adequate separation of vesicle populations to discern unique populations of vesicles from cells treated with glucose compared to no treatment. Using a two-way ANOVA, we found a statistically significant interaction between the effect of treatment on vesicles collected at each EKMr value for data collected only at 1200 V [F (8, 45) = 3.61, p= 0.003]. A Bonferroni post hoc test revealed a significant difference in the intensity or quantity of vesicles collected between treated and untreated samples at 1.10x109 V/m2 (p=0.0249), 5.35x109 V/m2 (p=0.0469), 7.45x109 V/m2 (p=0.0369). These differences reflect a shift in the populations of insulin vesicles upon glucose stimulation.” Page 7, lines 158-165

      We have also now directly addressed the potential identities of the different populations in the discussion section. This was addressed in major comment #1 and on page 7 lines, 176-191 of the manuscript.

      4) The title of the paper promises "isolation" of insulin vesicles, but the manuscript only presents separation and no isolation of the separated populations. Isolation of the separated populations is important to be able to better define what these populations are (see point #1 above). Isolation is also critical if this is to be a valuable technique in the future. Yet the paper is unclear on whether it is actually technically feasible to isolate the populations separated by iDEP. In line 367, it states "this method provides a mechanism for the isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis (imaging, proteomics, lipidomics, etc.)." However, in line 361 it says "developing the capability to port the collected individual boluses will enable downstream analyses such as mass spectrometry or electron microscopy," suggesting that true isolation of these populations is not yet feasible. This should be clarified.

      We thank the reviewer for pointing this out. We have modified the text and title to put more focus on our ability to separate vesicles rather than isolate. We agree that the isolation and further biophysical characterization of these subpopulations will be critical to understanding them. However, this capability is still in development. We have made the following change to clarify that a way to isolate these subpopulations once iDEP-assisted separation has occurred is currently being developed.

      Title: “Insulator-based dielectrophoresis-assisted separation of insulin secretory vesicles”

      “this method serves as a stepping stone towards isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis…” page 9, line 230-232.

      Reviewer #2 (Public Review):

      This manuscript used DC-iDEP, a technology previously used on other organelle preparations to isolate insulin secretory granules from INS1 cells based on differences in dielectrophoretic and electrokinetic properties of synaptotagmin V positive insulin granules.

      The major motivation presented for this work is to provide a methodology to allow for more sensitive isolation of subpopulations of granules allowing better understanding of the biochemical composition of these populations. This manuscript clearly demonstrates the ability of this technology to separate these subpopulations which will allow for future biochemical characterizations of insulin granules in future studies.

      After proving these subpopulations can be observed, this method was then utilized to show there are shifts in these subpopulations when granules are isolated from glucose stimulated cells. Overall the method of isolation is novel and could provide a tool for further characterization of purified secretory granules.

      The observation of glucose stimulation causing shifts in subpopulations is unsurprising. Glucose stimulation could cause a depletion of insulin and other secretory content from a subset of granules. It would be expected that this loss of content would cause a shift in electrochemical properties of the granules, but this is a nice confirmation that the isolation method has the sensitivity to delineate these changes.

      Major comments:

      1) It is unclear what Synaptotagmin isoform is being looked at. Synaptotagmin V and IX have been repetitively interchanged in the literature. See note in syt IX section of "Moghadam and Jackson 2013 Front. Endocrinology" or read "Fukuda and Sagi- Eisenberg Calcium Bind Proteins 2008".

      The 386 aa. isoform that is abundant in PC12 cells has been robustly observed in INS1 cells in multiple studies and has been frequently referred to as syt IX. The sequence the antibody was raised against should be determined from the company where this was purchased and then this should be mapped to to which isoform of Synaptotagmin by sequence and clarified in the text.

      We thank the reviewer for this comment. The supplier (Thermo Fisher Scientific) calls this antibody “Synaptotagmin V.” As it recognizes the 386 aa synaptotagmin isoform, we have changed references to this antibody to call it “Synaptotagmin IX” to match the existing literature.

      2) Immunofluorescence of insulin and syt V is confusing. The example images do not appear to show robust punctate structures that are characteristic of secretory granules (in both the insulin and syt V stain).

      We appreciate the reviewer bringing this point to our attention. We agree that the immunofluorescence images in Figure 2 are not representative of typical INS-1E cells and have replaced the original image for Figure 2 with new images that show punctate structures that are more characteristic of secretory granules. These images also have better colocalization of insulin and synaptotagmin V (now labeled synaptotagmin IX) than the original image, with Pearson’s R values of 0.66 and 0.64.

      3) In the discussion it says, "Finally, this method provides a mechanism for the isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis (imaging, proteomics, lipidomics, etc.) that otherwise would not be possible given the low-abundance components of these subpopulations."

      It would help to elaborate more on the yield and concentrations of isolated granules. This would give a better sense of what level of biochemical characterization could be performed on sub- populations of granules.

      We thank the reviewer for this comment. This line has been changed to clarify the current capabilities of iDEP, as subpopulations cannot presently be removed from the channel.

      “this method serves as a stepping stone towards isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis…” page 9 line 230-232.

      Once it is possible to isolate subpopulations from the channel, we expect to obtain sufficient sample for further characterization. We anticipate that biophysical characterization such as imaging will be highly feasible, and small-scale proteomics could also be possible. However, currently we have not measured the concentration of isolated vesicles due to complications in the isolation steps. If the quantity of isolated subpopulations proves inadequate for proteomic analysis, we plan to scale up our cell culture to generate enough insulin vesicles for further biochemical characterization. However, these experiments are out of scope for our current work, so we removed details on this idea in the Introduction and Discussion.

      Reviewer #3 (Public Review):

      The manuscript from Barekatain et al. is investigating heterogeneity within the population of insulin vesicles from an insulinoma cell line (INS-1E) in response to glucose stimulation. Prevailing dogma in the beta-cell field suggests that there are distinct pools of mature insulin granules, such as ready-releasable and a reserve pool, which contribute to distinct phases of insulin release in response to glucose stimulation. Whether these pools (and others) are distinct in protein/lipid composition or other aspects is not known, but has been suggested. In this manuscript, the authors use density gradient sedimentation to enrich for insulin vesicles, noting the existence of a number of co-purifying contaminants (ER and mitochondrial markers). Following immunolabeling with synaptotagmin V and fluorescent-conjugated secondary antibodies, insulin vesicles were applied to a microfluidic device and separated by dielectrophoretic and electrokinetic forces following an applied voltage. The equilibrium between these opposing forces was used to physically separate insulin granules. Here some differences were observed in the insulin (Syt V positive) granule populations, when isolated from cells that were either non-stimulated or stimulated with glucose, which has been suggested previously by other studies as noted by the authors; however in the current manuscript, the inclusion of a number of control experiments may provide a better context for what the data reveal about these changes.

      The major strength of the paper is in the use of the novel, highly sophisticated methodology to examine physical attributes of insulin granules and thus begin to provide some insight into the existence of distinct insulin granule populations within a beta-cell -these include insulin granules that are maturing, membrane- docked (i.e. readily releasable), in reserve, newly-synthesized, aged, etc. Whether physical differences exist between these various granule pools is not known. In this capacity, the technical abilities of the current manuscript may begin to offer some insight into whether these perceived distinctions are physical.

      The major weakness of the manuscript is that the study falls short in terms of linking the biology to the sophisticated changes observed and primarily focuses on differences in response to glucose. Without knowing what the various populations of granules are, it is challenging to understand what the changes in response to glucose mean.

      Specific concerns are as follows:

      1) There is confusion on what the DC-iDEP separation between stimulated and stimulated cells reveals. Do these changes reflect maturation state of granules, nascent vs. old granules? Ready- releasable vs. reserve pool? The comments in the text seem to offer all possibilities.

      We thank the reviewer for this comment. Additional experiments will be useful to concretely define the physical nature of these subpopulations. Our primary goal in this study is to assess the utility of DC-iDEP in reproducibly separating these subpopulations. Our current results reflect variations in the amounts of subpopulations described in the literature and/or in currently uncharacterized subpopulations. As addressed in Reviewer #1 question #1, we have added to the discussion to review these possibilities (Page 7-8, lines 176-191).

      2) It is unclear what we can infer regarding the physical changes of granules between the stimulated states of the cells. Without an understanding of the magnitude of the effect, it is unclear how biologically significant these changes are. For example, what degree of lipid or protein remodeling would be necessary to give a similar change?

      We thank the reviewer for this question. Separation by iDEP is sufficiently sensitive to distinguish particles with minimal differences between them. For example, we could successfully separate wild type GFP from a point mutation variant of GFP. We anticipate that this method is capable of distinguishing vesicles with greater physical differences between them resulting in more distinct EKMr values. However, significant future experiments are likely necessary to determine the extent of lipid and protein remodeling between each subpopulation to define the biological significance of each subpopulation.

      3) The reliance on a single vesicle marker, Syt V, is concerning given that granule remodeling is the focus.

      We appreciate the reviewer’s concern. The current manuscript focuses on synaptotagmin V (IX)-positive insulin vesicles. The results of these experiments demonstrate the capabilities of iDEP to reveal heterogeneity in a seemingly similar set of particles. In future experiments we plan to use the modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61). All insulin vesicles from this cell line contain GFP-tagged C-peptide, and therefore would allow for the detection of a more complete set of insulin vesicles. The results from the current manuscript provide the proof-of-concept validation that this method is promising for understanding vesicle remodeling in more detail in the future.

      4) Additional confirmation that the isolated vesicles are in fact insulin granules would be helpful. As noted, granules were gradient enriched, but did carry contaminants. Note that the microscopy image provided does not provide any real validation for this marker.

      Further confirmation that the immune-isolated vesicles are in fact insulin granules should be included. EM with immunogold labeling post-SytV enrichment would be a potential methodology to confirm.

      We thank the reviewer for this comment. We have performed new immunofluorescence imaging to demonstrate the overlap of insulin and synaptotagmin (Fig 2). Additionally, we have performed microscopy experiments with a modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to provide evidence of these granules’ identity. Fluorescence microscopy revealed that the isolated granules contain GFP-tagged C-peptide (Fig. 1—figure supplement 3A), while transmission electron microscopy and cryo-electron microscopy confirmed that these vesicles have radii within the correct range to be considered insulin vesicles (Fig 1—figure supplement 3B-C). We added the following text in the results section to describe the new results included:

      “To verify that the insulin vesicles were intact prior to DC-iDEP, we imaged a modified INS-1E cell line that contains a human insulin and green fluorescent protein-tagged C peptide (hPro-CpepSfGFP).49 This GFP tag allowed for quick visual verification of intact vesicles using fluorescence confocal microscopy. We observed distinct puncta rather than a diffuse GFP signal which indicated that the vesicles were intact and not ruptured. Further analysis of isolated vesicles was done using EM. We observed intact vesicles with the expected size and shape using both transmission electron microscopy (TEM) and cryo-electron microscopy (cryo-EM) (Fig. 1—figure supplement 3). Page 5, lines 104 – 109.

      5) It would be useful to understand if the observed effects are specific to the INS-1E cell line or are a more universal effect of glucose on beta-cells.

      We agree with the reviewer that it would be interesting to study these effects in primary beta cells. While we expect to see similar results in these cells, there may be differences in the population variations or EKMr values. However, working with beta cells is currently beyond the scope of this study, as our primary focus is on validating this approach.

    1. Author response:

      Reviewer #1 (Public Review):

      Authors propose a mechanism where actin polymerization in the dendritic shaft plays a key role in trapping AMPAR vesicles around the stimulated site, promoting the preferential insertion of AMPAR into the potentiated synapse. This dendritic mechanism is novel and may be important for phenomena. Authors also developed a sophisticated method to observe the endogenous behavior of AMPAR using the HITI system.

      However, there are some major issues that need to be addressed to support the authors' claims. Also, overall, it is hard to follow. It could be better written.

      We thank the reviewer for carefully reading our text and for the helpful recommendations. We have performed additional experiments and analysis to address the raised issues (detailed below). In addition, we have streamlined and shortened the text to improve its clarity and focus on the biological story.

      Reviewer #2 (Public Review):

      In this study, Wong and colleagues investigate mechanisms leading to input-specificity of LTP. They focus on the trafficking of AMPA receptors as the surface accumulation of AMPARs is one of the key features of potentiated synapses. They employ an elegant strategy to label endogenous GluA1 with a HaloTag using CRISPR-based technology and succeed to find targeting site which does not interfere with receptor's trafficking or function. This allowed them to visualize and track single receptors in endosomes as well as at the plasma membrane of primary rat hippocampal neurons. They develop and extend particle tracking and molecule counting algorithms to analyze active transport and diffusion of AMPARs and, as expected find that neuronal activation leads to increased surface expression of labelled AMPARs. Interestingly, they also observe a strong decrease in long-range motion of AMPAR-containing vesicles upon induction of chemical LTP. From this point, the manuscript focuses on explaining this observation. The authors switch from a global activation protocol to glutamate uncaging to induce LTP at individual synapses. Also, in these settings, they measure the reduction in mobile vesicle fraction within about 30 µm long dendritic segment containing the activated spine. In search of an explanation, they investigate activity-dependent actin polymerization as a possible confinement factor that could change the motility of organelles in dendrites. Their hypotheses is based on pre-existing literature demonstrating the role of F-actin in trapping and stalling dendritic endolysosomes as well similar role of F-actin in non-neuronal cells. Indeed, the authors convincingly show that pharmacological depolymerization or stabilization of F-actin bidirectionally impacts the trafficking behavior of AMPAR-containing vesicles in the dendritic shaft. To directly visualize effects of structural LTP at individual synapses on dendritic actin cytoskeleton, they employ a F-actin-binding probe Tractin. Here they find that cLTP results in the formation of dendritic F-actin fibers and bundles arranged in a network. The spatial extent of such a network correlates with an area where AMPAR vesicles exhibit decreased motility. Although this makes sense, I have some concerns about these experiments.

      Tractin has been previously published as F-actin marker but like several other binding probes (i.e. lifeact), it affects F-actin structure and dynamics. The large number of F-actin bundles is not very typical for dendrites of hippocampal neurons and might be an artifact of Tractin overexpression. It is difficult to judge whether this is a case because there is no comparison with the endogenous situation where F-actin is labelled directly. The final series of experiments focus on the role of processive myosins in stalling and exocytosis of AMPAR vesicles. To address this point, the authors employ a mixture of three different myosin inhibitors and show that although myosins are not responsible for increased vesicle confinement they facilitate exocytosis of AMPARs. What I find somewhat missing are data and examples of AMPAR trafficking into dendritic spines. Also here, stronger experimental support could benefit the conclusions.

      Overall, the authors achieved the aims of their study. They demonstrated that synapse-specific potentiation results in signaling which triggers actin polymerization in dendritic shaft beneath the activated input. This leads to trapping and accumulation of AMPAR-containing endosomes which then have higher probability to be delivered and secreted at activated dendritic spines. In addition to conceptual advance of this work, several state-of-the-art labeling and analysis techniques where developed in this project and they will likely be used by other groups.

      We thank the reviewer for raising these important issues with regards to the use of tractin as a marker for actin polymerization. We have performed additional experiments (detailed below) using phalloidin and also dominant negative inhibitors of myosin Va, Vb, and VI in order to strengthen our conclusions. We find that inducing synaptic activity with cLTP increases phalloidin labeling and the appearance of F-actin fibers. Moreover, inhibition of myosin Va and Vb (but not VI) using their dominant negative c-terminal domains recapitulates the effects of pharmacological inhibition on both the motion states and directional bias of GluA1-HT vesicles in response to cLTP.

      With regards to AMPAR trafficking into spines, we and others have found that GluA1-containing vesicles rarely enter dendritic spines (see response to Reviewer #2, comment 3). Furthermore, exocytic events occur largely at extrasynaptic sites, such as on the dendritic shaft (Figure 5-video 1-3; Lin et al., 2007; Makino et al., 2009; Patterson et al., 2010). Consequently, we believe vesicles are concentrated proximal to synaptic activity in the dendritic shaft rather than in the dendritic spine itself, creating a larger reservoir of intracellular AMPARs that can exocytose during synaptic activity. Others have demonstrated that surface bound AMPARs diffuse across the cell membrane into stimulated synapses where they are captured (Choquet and Opazo, 2022).

      We also thank the reviewers for acknowledging the conceptual and technical advances in this work.

      Reviewer #3 (Public Review):

      Wong et al. developed a new versatile approach with a robust signal to track protein dynamics by inserting a tag into the endogenous loci and different properties of fluorescent dyes for conjugation. Using this approach, the authors monitor the trafficking of Fluorescent dye and Halo-tagged GluA1 with time-lapse imaging and found that neuronal stimulation induces GluA1 accumulation surrounding stimulated synapses on dendritic shafts and actin polymerization at synapses and dendrites. Furthermore, combining with pharmacological manipulations of actin polymerization or myosin activity, the authors found that actin polymerization facilitates exocytosis of GluA1 near activated synapses. The new approach may provide broad impacts upon appropriate control experiments, and the practical application of this approach to GluA1 trafficking upon neuronal activation is significant. However, there are several weaknesses, including confirmation of activity of the tagged receptors and receptor specificity mimicking endogenous LTP machinery. If the receptor tagged by the new robust approach reflects endogenous machinery, this approach will provide a big opportunity to the community as a versatile method to visualize a protein not visualized previously.

      Although we use methods previously demonstrated to stimulate LTP, we do not ourselves demonstrate LTP using electrophysiological methods, and consequently we have changed the text to focus on synaptic plasticity (specifically structural plasticity). Furthermore, we confirm the activity of HaloTag knock-in receptors by expressing GluA1-HT and GluA1-HT-SEP in HEK293T cells and performing whole-cell patch clamp experiments. We find that GluA1-HT and GluA1-HT-SEP responds to glutamate in a similar manner to untagged GluA1.

      We also thank the reviewer for acknowledging the novelty of our strategy.

    1. Author response:

      We thank both reviewers for their constructive feedback. We were grateful to see that both reviewers found our work to be valuable to the field, and agreed that new metrics (including our introduced MECR) were important for dataset evaluation. We briefly respond to two main points from the reviewers.

      (1) Key findings from our manuscript. While we do evaluate publicly available datasets in our manuscript, the focus/conclusion of our work is not to return a definitive ranking of in-situ technologies. As reviewers point out, our comparative evaluation is only in a single biological context, and we further note that many of these in situ platforms are rapidly evolving with new chemistries and gene panels. 

      Instead, the conclusion and purpose of our manuscript was to emphasize the importance and need for new metrics when evaluating spatial datasets. We propose an option, and demonstrate how cell segmentation can affect technical metrics, but also downstream biological analysis of in-situ datasets.

      (2) Comparing technologies with different gene panels. The reviewers correctly point out that comparing technologies that use different gene panels is not a perfect benchmark. We agree that differences in molecular counts could arise due to biological differences in the abundance of targeted genes.

      We did address this in Supplementary Figure 4, where we perform pairwise comparisons of each technology - and compute these only using overlapping genes that were measured by both technology. Our results are consistent with the analysis of full gene sets. 

      While we believe that regenerating in-situ datasets with identical gene panels is beyond the scope of this work (and is likely technically infeasible), we hope that our findings are still valuable and informative to the growing spatial community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study. 

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and now submit a revised version that addresses the reviewers’ comments as detailed below in response to each concern. We include a more open discussion of alternative possibilities and have added experiments demonstrating that AMPAergic scaling in our mouse cortical cultures is triggered differently than GABAergic scaling. We treated the cultured neurons exactly as described for triggering GABAergic scaling (20µM CNQX for 24 hours), however this did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking/bursting activity. Below we explain the result further, but ultimately this does demonstrate independent regulation of mEPSCs and mIPSCs as requested by the editor/reviewer (spike reductions induced by CNQX reduced mIPSC amplitude, but had no effect on mEPSC amplitude).

      Reviewer #1 (Public Review):

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). We now discuss these complications in a more detailed description in the manuscript (CTZ section of results and at end of the discussion). First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that these are multiunit recordings and not purely excitatory principal neurons that we target for mPSC recordings. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret and we now discuss this (end of discussion). Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of all of our findings. Future work will be necessary to determine how alterations in spiking lead to changes in mIPSC amplitude and we now discuss this (2nd to last paragraph in discussion).

      Then, the fact that TTX applied on top of CTZ drives an increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors. 

      We believe that the most parsimonious explanation for our results is that spiking activity, not AMPAR activation, triggers GABAergic downscaling. GABAergic scaling is no different when comparing 24hr TTX treatment vs TTX+CTZ, and optogenetic restoration of spiking activity while continuing to block AMPAR activation was able to restore GABAergic mPSC amplitudes to control levels. It is important to emphasize that our results with TTX vs. TTX+CTZ are different for GABAergic scaling (no difference in this study) and AMPAergic scaling (CTZ diminished upward scaling in previous study – Fong et al., 2015 - PMID: 25751516) suggesting different triggers for the two forms of scaling. While we strongly believe we have demonstrated that GABAergic downscaling is dependent on spiking (not AMPAergic transmission), we now acknowledge that we cannot rule out the possibility that upward GABAergic scaling may be influenced by AMPAR activation (2nd paragraph discussion), although we have no evidence in support of this.

      Specific points:

      - The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate). 

      We largely agree with the reviewer and should not have implied that this was a necessary requirement for a spike rate homeostat. What we should have said was that historically this definition has been applied to AMPAergic scaling, which is thought to be a spike rate homeostat. We have now corrected this (introduction and discussion).

      - Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition. 

      Agreed, and we have now done this.

      - The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this previous study scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling in that cell. The more recent study mentioned by the reviewer achieved scaling by hyperpolarizing a majority of cells in the network. Importantly, this approach alters neurotransmission throughout the network, making it challenging to isolate the specific contributions of spiking vs. receptor activation. Unlike the previous study, which focused on the impact within individual cells, this newer study involves global alterations in network activity, complicating the interpretation of the role of spiking versus receptor activation in triggering scaling.

      - Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was not our point here, rather that there was a positive slope meaning ratios (CNQX mEPSC amplitudes/control mEPSC amplitudes) got bigger for the larger CNQX-treated mEPSCs. Alternatively, a multiplicative relationship where mEPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this was partially obstructed by the Y axis in this figure and we have adjusted this. This left part of the plot is likely due the CNQX-induced increases in mEPSC amplitudes of mini’s that where below our detection threshold of 5pA, as suggested by the reviewer. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mEPSCs which are 5 pAs (ratio of 1). We tried to do a better job describing this in the resubmission (1st paragraph of results).

      - Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted? 

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (smallest GABA mIPSCs that don’t fall into noise are 5pA and this is divided by the smallest control GABA mPSCs of 5pPA and therefore the ratio is 1). The rest of the distribution will then approach the scaling factor (50% in this case). On the right side of the ratio plot the values appear to slightly increase. We are not sure why this is happening, but it maybe that a small percentage of mIPSCs are not purely multiplicative at 0.5, however the biggest mPSCs can vary to a great degree from one cell to the next and in other cases we do not see this (Figure 4B, Figure 5E). We tried to do a better job describing this in the resubmission (results describing Figure 2).

      - The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit. 

      We have adjusted the figures to be more consistent throughout the manuscript.

      - I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution. 

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after perturbing the system, and certainly not for the specific cells we recorded from to obtain mPSC amplitudes, and so we cannot say that we have perfectly restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a relatively normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study using optogenetic activity restoration (Fong et al., 2015) we were able to show that this was the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optogenetic stimulation (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We now include a brief description of this in the manuscript (results describing Figure 3).

      - Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      - Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback. 

      As mentioned above, the multiplicative nature of scaling has been a historical proposal for AMPAergic scaling and we have now found such a relationship for GABAergic scaling. This is important for understanding how this plasticity works, but we agree that it is not necessary for a homeostat and we have adjusted the manuscript accordingly.

      - 277: do you mean AMPAR? 

      We were not clear enough here. We actually do mean GABAR. The idea was that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We have rewritten this part of the discussion to avoid any confusion (2nd paragraph discussion).

      - Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      - Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes. 

      These figures were generated by the estimation statistics website and text may have been resized inappropriately. We have tried to adjust this and now have attempted to standardize the axes text to the best of our ability.

      - The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture. 

      We have added this consideration at the end of the discussion. Further, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      - The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development. 

      We now discuss caveats of cortical cultures at the end of the discussion.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play. 

      We have broadened our discussion of alternative interpretations throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      While I am hesitant to judge a paper based on its tone, I would personally recommend revision of some of the subjective words and statements, as the manuscript undermines its own effectiveness by making unnecessarily strong statements. The text repeatedly paints an "either A or B" picture, and if there's any general lesson in biology, it's that it's always A and B. Global, multiplicative glutamatergic scaling could quite conceivably occur alongside GABAergic scaling, as well as synapse-specific homeostatic modifications. It seems that it would be wise to acknowledge that, while the data presented here point in one direction, in vivo results in an adult brain (for example) might present an entirely different set of patterns. This will not only enhance the readability of the paper but also ensure that the scientific community can engage with the work in a constructive and collaborative manner. Again, I present this as only a constructive and supportive suggestion. I am a big fan of work from this laboratory, and I would love to see this paper in an improved form - it's an important set of ideas and I do believe that these data are rigorously collected. 

      We have attempted to provide a more comprehensive interpretation of our results. We agree that a homeostat can come in many flavors, but do believe that GABAergic scaling is strong candidate, whereas AMPAergic scaling does not currently fit such a role. We do now discuss caveats with our work and are open to other interpretations that need to be flushed out in future work.

      Reviewer #2 (Public Review):

      Major points:

      (1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor-mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 (PMID: 25751516). Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation (Fong et al 2015 and Figure 6 Newman et al., 2015 - PMID: 26140329). Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents as we had mentioned. We have now added the point about NMDARs mediating the remaining bursts in the manuscript (results associated with Figure 1). We are not clear on what the reviewer has in mind in terms of “NMDA-mediated transmission takes over and triggers a different kind of network plasticity”, but we do discuss the possibility that spiking triggers GABAergic scaling through its effect on NMDAergic transmission, which we cannot rule out, but also have no evidence in support of this idea (3rd and 5th paragraph of discussion). We do plan on addressing this in a future work.

      (2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration and form of the burst in rat cortical cultures (see Suppl. Figure 6 Fong et al., 2015 Nat Comm and Figure 6 Newman et al., 2015). Even high concentrations of CNQX (40µM) prevented us from restoring spiking in mouse cultures in the current study, which is why we moved to 20µM CNQX for this study. The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was dramatically reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and discuss this in the resubmission (3rd and 5th paragraph of the discussion). We had not considered this previously because prior work suggested that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission ontogenetically, at least to some extent, had no influence on AMPAergic scaling (Fong et al., 2015).

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was of short duration (duration 50-100 ms) and was turned off before the vast majority of spiking that occurred in the bursts. So the light flash was a trigger that allowed a relatively normal looking burst to occur after the light was off (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we now make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      (3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it. Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments (see above). To be clear we are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we now discuss in the resubmission (final paragraph discussion). The fact that mIPSCs are no different after TTX treatment vs CTZ+TTX treatment suggests that AMPAergic transmission is not so influential on GABAergic downscaling. While the CTZ results are not conclusive by themselves, taken together with the optogenetic results, where restoration of spiking in AMPAR blockade reverses scaling, is most consistent with idea that GABAergic scaling is triggered by spiking rather than AMPAR activation and places GABAergic scaling as a strong candidate as spike rate homeostat. Although we do feel that we have demonstrated that downward GABAergic scaling is dependent on spiking, we cannot rule out the possibility that upward GABAergic scaling could be influenced by AMPAR activation to some extent. We now acknowledge this possibility (2nd paragraph discussion).

      (4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures and cells are now stated in the figure legends.

      (5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We now include this discussion in the resubmission (final paragraph discussion). Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we have expanded our description of the variability in our results, including that based on cellular heterogeneity. 

      Reviewer #2 (Recommendations For The Authors):

      Minor comments –

      Fig S3: Please quantify changes in frequency

      We have done this (Supplemental Figure 5).

      Fig 2: please choose colors with higher contrast for CNQX/TTX

      We have done this.

      Fig. 3C: Why doesn't CNQX+PhotoStim reach control levels of bursting at 2h?

      The program was designed to follow and maintain total spike frequency and so it does a better job at this than maintaining burst frequency.

      Fig. 5A: please include a comparison between control and Ethanol

      We now do this in Figure 5C. Both around 26pAs.

      Fig. 5C: where is the Etoh condition?

      We have made this figure more clear in terms of controls (Figure 5C & D).

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, we have attempted to redo the study on AMPAergic scaling on these mouse cortical neurons. Surprisingly, we found that 20µM CNQX did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking activity and was able to produce GABAergic downscaling. We did not carry out the optogenetic restoration of activity, because we did not trigger upscaling. The result does however, show that the reductions in spiking/bursting that trigger GABAergic downscaling, did not trigger AMPAergic upscaling and therefore dissociate the 2 forms of scaling in these mouse cultures. We do not know why 20 µM CNQX did not trigger scaling in these cultures since it does reduce spiking and AMPAR activation. In the Fong study we used 40µM CNQX because intracellular recordings from rat cortical neurons suggested this was required to completely block AMPAergic currents. Our initial studies in the current manuscript examining GABAergic scaling in mouse cortical cultures used 40µM CNQX, however, this concentration of CNQX prevented us from restoring spiking through optogenetic activation, so we reduced our concentration to 20µM CNQX, which did trigger GABAergic downscaling and allowed the restoration of spiking. We now show and discuss this result (Figure 7 and 3rd paragraph discussion).

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transients which may be important for triggering the plasticity. Further, we acknowledge that we cannot rule out a role for NMDARs contributing to GABAergic scaling (3rd and 5th paragraph of discussion). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings, and this is now discussed (final paragraph of discussion). Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). However, we believe that the most parsimonious explanation of our results supports our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we have included a broader discussion about these possibilities, and recognize that we cannot rule out the possibility that AMPAergic transmission could contribute to upward GABAergic scaling (2nd paragraph discussion).

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered in some but not all cultures and bursts have not recovered, that GABAergic scaling has already occurred. We now state this more clearly in the resubmission (near the end of the 2nd paragraph of the discussion).

      Reviewer #3 (Recommendations For The Authors):

      The statistics are inadequately described. The full information including actual p values should be given, particularly for the non-significant trends reported.

      We have done this in Figure legends.

      The abstract and introduction give the impression that GABA and Glu HSP are independent, though most work links them as occurring simultaneously and in a coordinated fashion to achieve homeostasis.

      While it is true that many studies have triggered both forms of scaling with activity or transmission blockade, these studies have not addressed whether these forms of scaling are actually triggered in the same way mechanistically, except potentially for the one study that we mentioned (Joseph et al.,). Our results suggest they are independent. We now do mention the idea that these two forms of scaling have been assumed to be commonly triggered (3rd paragraph introduction).

      The data in Fig 6 is presented as if BIC treatment is a novel result, although BIC/Gabazine/PTX have been used to induce down-scaling in many previous papers. While it's good to have the results, they should be put in proper context. As suggested in the paper, testing if decreased GABAR function would lead to upscaling does not make sense given all the previous data. 

      Figure 6 shows GABAergic upscaling in response to GABAR block (bicuculline), but we are aware of only two other studies that looked at GABAergic scaling after treating with a GABAR blocker and they found upscaling but this was in hippocampal cultures, not cortical cultures (Peng et al., 2010 - PMID: 21123568, Pribiag et al., 2014 - PMID: 24753587). We now mention this in the results section describing Figure 6. While many studies have blocked GABARs and find AMPAergic downscaling, we are addressing the triggers for GABAergic scaling in Figure 6.

      Is Fig S4B mislabeled? The title says spike rate but the graph axis says burst frequency.

      The reviewer is correct and we have now adjusted this.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Weaknesses:

      There are however, substantial concerns about the interpretation of the findings and limitations to the current analysis. In particular, Analysis of single unit activity is absent, making interpretation of population clusters and decoding less interpretable. These concerns should be addressed to make sure that the results can be interpreted clearly in an active field that already contains a number of confusing and possibly contradictory findings.

      We addressed this important point (which was also made by reviewer #1) in our previous revision. Specifically, we included additional analyses that operate at the level of single units rather than the population level, as requested by the reviewer. For example, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. It is therefore no longer correct to say that “analysis of single unit activity is absent”, and we would be grateful if this statement could be changed.  

      Reviewer #2 (Recommendations For The Authors): 

      The authors have done a good job addressing the main concerns from the previous review. There are a few additional points that hopefully do not require substantial additional edits. 

      Figure 5/supplements. While the authors provide compelling evidence that clusters and overall activity patterns are similar for lesioned and control animals, there do appear to be some differences. For instance, the hit/miss difference for cluster 3 (the "auditory" cluster) appears to be absent for lesioned mice (Fig 5S3 D). Can the hit-miss difference be quantified? 

      We agree that there are some differences between the activity profiles of lesioned and non-lesioned mice: Inspection of panels A and C of Figure 5 – figure supplement 3, for instance, indicates that there is a relatively high proportion of neurons in cluster 3 of the non-lesioned mice that exhibit prolonged elevated activity in hit trials and a relatively lower proportion of those neurons in cluster 3 of lesioned mice. This likely explains the difference in the average response profiles of cluster 3 between the two groups pointed out by the reviewer. Furthermore, there is a slightly larger pre-stimulus dip in hit trial activity for lesioned than non-lesioned mice in cluster 1, a more pronounced short latency peak in hit trial activity for lesioned mice in cluster 2 as well as differences in other clusters. However, these differences are not inconsistent with our interpretation of these data in that we describe the activity profiles as being “similar” and exhibiting a “close correspondence” (rather than as being identical). Having considered this carefully, we do not believe that attempting to quantify these small differences would add much value here or help the reader with the interpretation of these data, especially given that the activity profiles of all neurons that make up each cluster are plotted in panels A and C.  

      Could the mice have been using somatosensory information to perform the task? A wideband click presented from a free-field speaker could have energy in a low frequency range that triggers a whisker response. Given the moderate but not insignificant somatosensory input into the IC shell, this doesn't seem like a trivial concern, and it could substantially impact interpretation of the results. Without wanting to complicate things too much, the authors might consider one or more of these questions: What's the frequency content of the click? Can a deaf mouse perform the task? Can an AC-lesioned mouse learn/perform the task with close-field acoustic stimulation? Or for a highfrequency tone target rather than a click?

      This is an interesting suggestion. We have, in the context of another study, trained mice in our lab to detect somatosensory stimulation (a brush stroke to their whiskers) and consistently found that it takes them much longer (often two weeks or more) to learn to respond to a stimulation of their whiskers than to the presentation of a sound. The brush strokes applied to the whiskers in those experiments were 50-150 ms in duration and were thus orders of magnitude greater in both their duration and amplitude and considerably more salient than any somatosensory stimulus that could potentially arise from the clicks presented here. Therefore, we consider it highly unlikely that mice learned to use somatosensory information potentially picked up by their whiskers to perform the click detection task.  

      L. 63. The authors might want to cite some recent work from the Apostilides lab on the properties of AC-IC projections as well as non-auditory signals in the IC. 

      There are two recent papers from the Apostolides lab that are relevant to our study. We already cite Quass et al., 2023. We have now added Ford et al., 2024 as well.

      Changes to manuscript:

      Line 81: “This raises the possibility that these context-dependent effects may be inherited from the auditory cortex (Ford et al., 2024)”.

      L. 220. "sound-responsive neurons" It is possible to report the representation of sound-responsive neurons in the different clusters? This might help tease apart what processes contribute to their respective activity. Not a big problem if the samples can't be registered easily.

      Sound-driven neurons were identified on the basis of a subset (those trials in which sounds were presented at levels from 53 dB SPL to 65 dB SPL) of the trials used for the clustering analysis so the analyses are not directly comparable.

      p. 603. "quieter stimuli" What sound level was actually used in the 2p experiments? Was it fixed at a single level per animal?

      Sound level was not fixed at a single level. A total of nine different sound levels were used per mouse. We apologize that this was not made clear previously.  

      Changes to manuscript:

      Line 603: “Once the mice had achieved a stable level of performance (typically two days with d’ > 1.5), quieter stimuli (41-71 dB SPL) were introduced. For each mouse a total of 9 different sound levels were used and the range of sound levels was adjusted to each animal’s behavioral performance to avoid floor and ceiling effects and could, therefore, differ from mouse to mouse.”

      L. 747. Something is not right with this formula. It appears that it will always reduce to a value of 1/2.

      Thanks for spotting this. There are two typos in this formula. This has been fixed and now reads (line 749):  

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls. 

      Strengths: 

      The idea is novel, the evidence is convincing, and the data are clean. The findings have implications for human diseases. 

      Weaknesses: 

      None. 

      Reply: We thank the reviewer for the supportive comments.

      Reviewer #2 (Public Review): 

      Summary: 

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila models and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence. 

      Strengths: 

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are well-justified. 

      Weaknesses: 

      None 

      Reply: We thank the reviewer for the supportive comments.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      The authors have addressed my concerns. I have two more suggestions: 

      (1) Since the authors found that MG132 inhibits autophagy, which is inconsistent with previous findings that it promotes autophagy (e.g., PMID: 26648402, 30647455, 28674081), they should discuss this discrepancy in the Discussion.

      Reply: We thank the reviewer for raising this point. We agree with the reviewer that it has been well known in the literature that MG132 can lead to activation of autophagy. Indeed, we have observed in this study that MG132 itself can lead to time-dependent increases in LC3II levels in the first 8 hours of the MG132 treatment (Fig. S5B). These observations reflect the adaptive response of the cell to activate autophagy following proteasomal inhibition. However, as the MG132-mediated proteasomal inhibition persists, it is expected that the accumulation of misfolded protein substrates may overwhelm protein degradation systems, including the autophagylysosome pathway. Indeed, we have observed a reduction of the autophagic flux after 48 hours of the MG132 treatment (Fig. 3). Importantly, the DBT KO cells were able to maintain significantly higher levels of autophagic activities than the WT cells at this time point, consistent with their resistance to MG132-induced cell death. As suggested, we have added more discussion on the dynamic changes in the autophagic activities following proteasomal inhibition.

      (2) A grammar issue: consider removing some of the article "the," e.g.: 

      page 6: "the increase in cleaved PARP1 "-->"an increase in cleaved PARP1";  "the loss of DBT "-->"loss of DBT" 

      page 7: "the loss of DBT "-->"loss of DBT"; "The ubiquitin modification"-->"Ubiquitin modification" 

      Reply:  We thank the reviewer for the supportive comments. And we have removed some of the grammar issues in the article.

      Reviewer #2 (Recommendations For The Authors): 

      The authors have addressed my concerns. 

      Reply: We thank the reviewer for the supportive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Protein conformational changes are often critical to protein function, but obtaining structural information about conformational ensembles is a challenge. Over a number of years, the authors of the current manuscript have developed and improved an algorithm, qFit protein, that models multiple conformations into high resolution electron density maps in an automated way. The current manuscript describes the latest improvements to the program, and analyzes the performance of qFit protein in a number of test cases, including classical statistical metrics of data fit like Rfree and the gap between Rwork and Rfree, model geometry, and global and case-by-case assessment of qFit performance at different data resolution cutoffs. The authors have also updated qFit to handle cryo-EM datasets, although the analysis of its performance is more limited due to a limited number of high-resolution test cases and less standardization of deposited/processed data.

      Strengths:

      The strengths of the manuscript are the careful and extensive analysis of qFit's performance over a variety of metrics and a diversity of test cases, as well as the careful discussion of the limitations of qFit. This manuscript also serves as a very useful guide for users in evaluating if and when qFit should be applied during structural refinement.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Wankowicz et al. describes updates to qFit, an algorithm for the characterization of conformational heterogeneity of protein molecules based on X-ray diffraction of Cryo-EM data. The work provides a clear description of the algorithm used by qFit. The authors then proceed to validate the performance of qFit by comparing it to deposited X-ray entries in the PDB in the 1.2-1.5 Å resolution range as quantified by Rfree, Rwork-Rfree, detailed examination of the conformations introduced by qFit, and performance on stereochemical measures (MolProbity scores). To examine the effect of experimental resolution of X-ray diffraction data, they start from an ultra high-resolution structure (SARS-CoV2 Nsp3 macrodomain) to determine how the loss of resolution (introduced artificially) degrades the ability of qFit to correctly infer the nature and presence of alternate conformations. The authors observe a gradual loss of ability to correctly infer alternate conformations as resolution degrades past 2 Å. The authors repeat this analysis for a larger set of entries in a more automated fashion and again observe that qFit works well for structures with resolutions better than 2 Å, with a rapid loss of accuracy at lower resolution. Finally, the authors examine the performance of qFit on cryo-EM data. Despite a few prominent examples, the authors find only a handful (8) of datasets for which they can confirm a resolution better than 2.0 Å. The performance of qFit on these maps is encouraging and will be of much interest because cryo-EM maps will, presumably, continue to improve and because of the rapid increase in the availability of such data for many supramolecular biological assemblies. As the authors note, practices in cryo-EM analysis are far from uniform, hampering the development and assessment of tools like qFit.

      Strengths

      qFit improves the quality of refined structures at resolutions better than 2.0 A, in terms of reflecting true conformational heterogeneity and geometry. The algorithm is well designed and does not introduce spurious or unnecessary conformational heterogeneity. I was able to install and run the program without a problem within a computing cluster environment. The paper is well written and the validation thorough.

      I found the section on cryo-EM particularly enlightening, both because it demonstrates the potential for discovery of conformational heterogeneity from such data by qFit, and because it clearly explains the hurdles towards this becoming common practice, including lack of uniformity in reporting resolution, and differences in map and solvent treatment.

      Weaknesses

      The authors begin the results section by claiming that they made "substantial improvement" relative to the previous iteration of qFit, "both algorithmically (e.g., scoring is improved by BIC, sampling of B factors is now included) and computationally (improving the efficiency and reliability of the code)" (bottom of page 3). However, the paper does not provide a comparison to previous iterations of the software or quantitation of the effects of these specific improvements, such as whether scoring is improved by the BIC, how the application of BIC has changed since the previous paper, whether sampling of B factors helps, and whether the code faster. It would help the reader to understand what, if any, the significance of each of these improvements was.

      Indeed, it is difficult (embarrassingly) to benchmark against our past work due to the dependencies on different python packages and the lack of software engineering. With the infrastructure we’ve laid down with this paper, made possible by an EOSS grant from CZI, that will not be a problem going forward. Not only is the code more reliable and standardized, but we have developed several scientific test sets that can be used as a basis for broad comparisons to judge whether improvements are substantial. We’ve also changed with “substantial improvement” to “several modifications”  to indicate the lack of comparison to past versions.

      The exclusion of structures containing ligands and multichain protein models in the validation of qFit was puzzling since both are very common in the PDB. This may convey the impression that qFit cannot handle such use cases. (Although it seems that qFit has an algorithm dedicated to modeling ligand heterogeneity and seems to be able to handle multiple chains). The paper would be more effective if it explained how a user of the software would handle scenarios with ligands and multiple chains, and why these would be excluded from analysis here.

      qFit can indeed handle both. We left out multiple chains for simplicity in constructing a dataset enriched for small proteins while still covering diversity to speed the ability to rapidly iterate and test our approaches. Improvements to qFit ligand handling will be discussed in a forthcoming work as we face similar technical debt to what we saw in proteins and are undergoing a process of introducing “several modifications” that we hope will lead to “substantial improvement” - but at the very least will accelerate further development.

      It would be helpful to add some guidance on how/whether qFit models can be further refined afterwards in Coot, Phenix, ..., or whether these models are strictly intended as the terminal step in refinement.

      We added to the abstract:

      “Importantly, unlike ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot)  and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster).”

      and introduction:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      and results:

      “This model can then be examined and edited in Coot12 or other visualization software, and further refined using software such as phenix.refine, refmac, or buster as the modeler sees fit.”

      and discussion

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore generally also be deposited in the PDB using the standard deposition and validation process.”

      Appraisal & Discussion

      Overall, the authors convincingly demonstrate that qFit provides a reliable means to detect and model conformational heterogeneity within high-resolution X-ray diffraction datasets and (based on a smaller sample) in cryo-EM density maps. This represents the state of the art in the field and will be of interest to any structural biologist or biochemist seeking to attain an understanding of the structural basis of the function of their system of interest, including potential allosteric mechanisms-an area where there are still few good solutions. That is, I expect qFit to find widespread use.

      Reviewer #3 (Public Review):

      Summary:

      The authors address a very important issue of going beyond a single-copy model obtained by the two principal experimental methods of structural biology, macromolecular crystallography and cryo electron microscopy (cryo-EM). Such multiconformer model is based on the fact that experimental data from both these methods represent a space- and time-average of a huge number of the molecules in a sample, or even in several samples, and that the respective distributions can be multimodal. Different from structure prediction methods, this approach is strongly based on high-resolution experimental information and requires validated single-copy high-quality models as input. Overall, the results support the authors' conclusions.

      In fact, the method addresses two problems which could be considered separately:

      - An automation of construction of multiple conformations when they can be identified visually;

      - A determination of multiple conformations when their visual identification is difficult or impossible.

      We often think about this problem similarly to the reviewer. However, in building qFit, we do not want to separate these problems - but rather use the first category (obvious visual identification) to build an approach that can accomplish part of the second category (difficult to visualize) without building “impossible”/nonexistent conformations - with a consistent approach/bias.

      The first one is a known problem, when missing alternative conformations may cost a few percent in R-factors. While these conformations are relatively easy to detect and build manually, the current procedure may save significant time being quite efficient, as the test results show.

      We agree with the reviewers' assessment here. The “floor” in terms of impact is automating a tedious part of high resolution model building and improving model quality.

      The second problem is important from the physical point of view and has been addressed first by Burling & Brunger (1994; https://doi.org/10.1002/ijch.199400022). The new procedure deals with a second-order variation in the R-factors, of about 1% or less, like placing riding hydrogen atoms, modeling density deformation or variation of the bulk solvent. In such situations, it is hard to justify model improvement. Keeping Rfree values or their marginal decreasing can be considered as a sign that the model is not overfitted data but hardly as a strong argument in favor of the model.

      We agree with the overall sentiment of this comment. What is a significant variation in R-free is an important question that we have looked at previously (http://dx.doi.org/10.1101/448795) and others have suggested an R-sleep for further cross validation (https://pubmed.ncbi.nlm.nih.gov/17704561/). For these reasons it is important to get at the significance of the changes to model types from large and diverse test sets, as we have here and in other works, and from careful examination of the biological significance of alternative conformations with experiments designed to test their importance in mechanism.

      In general, overall targets are less appropriate for this kind of problem and local characteristics may be better indicators. Improvement of the model geometry is a good choice. Indeed, yet Cruickshank (1956; https://doi.org/10.1107/S0365110X56002059) showed that averaged density images may lead to a shortening of covalent bonds when interpreting such maps by a single model. However, a total absence of geometric outliers is not necessarily required for the structures solved at a high resolution where diffraction data should have more freedom to place the atoms where the experiments "see" them.

      Again, we agree—geometric outliers should not be completely absent, but it is comforting when they and model/experiment agreement both improve.

      The key local characteristic for multi conformer models is a closeness of the model map to the experimental one. Actually, the procedure uses a kind of such measure, the Bayesian information criteria (BIC). Unfortunately, there is no information about how sharply it identifies the best model, how much it changes between the initial and final models; in overall there is not any feeling about its values. The Q-score (page 17) can be a tool for the first problem where the multiple conformations are clearly separated and not for the second problem where the contributions from neighboring conformations are merged. In addition to BIC or to even more conventional target functions such as LS or local map correlation, the extreme and mean values of the local difference maps may help to validate the models.

      We agree with the reviewer that the problem of “best” model determination is poorly posed here. We have been thinking a lot about htis in the context of Bayesian methods (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278553/); however, a major stumbling block is in how variable representations of alternative conformations (and compositions) are handled. The answers are more (but by no means simply) straightforward for ensemble representations where the entire system is constantly represented but with multiple copies.

      This method with its results is a strong argument for a need in experimental data and information they contain, differently from a pure structure prediction. At the same time, absence of strong density-based proofs may limit its impact.

      We agree - indeed we think it will be difficult to further improve structure prediction methods without much more interaction with the experimental data.

      Strengths:

      Addressing an important problem and automatization of model construction for alternative conformations using high-resolution experimental data.

      Weaknesses:

      An insufficient validation of the models when no discrete alternative conformations are visible and essentially missing local real-space validation indicators.

      While not perfect real space indicators, local real-space validation is implicit in the MIQP selection step and explicit when we do employ Q-score metrics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A point of clarification: I don't understand why waters seem to be handled differently in for cryo-EM and crystallography datasets. I am interested about the statement on page 19 that the Molprobity Clashscore gets worse for cryo-EM datasets, primarily due to clashes with waters. But the qFit algorithm includes a round of refinement to optimize placement of ordered waters, and the clashscore improves for the qFit refinement in crystallography test cases. Why/how is this different for cryo-EM?

      We agree that this was not an appropriate point. We believe that the high clash score is coming from side chains being incorrectly modeled. We have updated this in the manuscript and it will be a focus of future improvements.

      Reviewer #2 (Recommendations For The Authors):

      - It would be instructive to the reader to explain how qFit handles the chromophore in the PYP (1OTA) example. To this end, it would be helpful to include deposition of the multiconformer model of PYP. This might also be a suitable occasion for discussion of potential hurdles in the deposition of multiconformer models in the PDB (if any!). Such concerns may be real concerns causing hesitation among potential users.

      Thank you for this comment. qFit does not alter the position or connectivity of any HETATM records (like the chromophore in this structure). Handling covalent modifications like this is an area of future development.

      Regarding deposition, we have noted above that the discussion now includes:

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore, generally also be deposited in the PDB using the standard deposition and validation process.”

      Finally, we have placed all PDBs in a Zenodo deposition (XXX) and have included that language in the manuscript. It is currently under a separate data availability section (page XXX). We will defer to the editor as to the best header that should go under.

      - It may be advisable to take the description of true/false pos/negatives out of the caption of Figure 4, and include it in a box or so, since these terms are important in the main text too, and the caption becomes very cluttered.

      We think adding the description of true/false pos/negatives to the Figure panel would make it very cluttered and wordy. We would like to retain this description within the caption. We have also briefly described each in the main text.

      - page 21, line 4: some issue with citation formatting.

      We have updated these citations.

      - page 25, second paragraph: cardinality is the number of members of a set. Perhaps "minimal occupancy" is more appropriate.

      Thank you for pointing this out. This was a mistake and should have been called the occupancy threshold.

      - page 26: it's - its

      Thank you, we have made this change. 

      - Font sizes in Supplementary Figures 5-7 are too small to be readable.

      We agree and will make this change. 

      Reviewer #3 (Recommendations For The Authors):

      General remarks

      (1) As I understand, the procedure starts from shifting residues one by one (page 4; A.1). Then, geometry reconstruction (e.g., B1) may be difficult in some cases joining back the shifted residues. It seems that such backbone perturbation can be done more efficiently by shifting groups of residues ("potential coupled motions") as mentioned at the bottom of page 9. Did I miss its description?

      We would describe the algorithm as sampling (which includes minimal shifts) in the backbone residues to ensure we can link neighboring residues. We agree that future iterations of qFit should include more effective backbone sampling by exploring motion along the Cβ-Cα, C-N, and (Cβ-Cα × C-N) bonds and exploring correlated backbone movements.

      (2) While the paper is well split in clear parts, some of them seem to be not at their right/optimal place and better can be moved to "Methods" (detailed "Overview of the qFit protein algorithm" as a whole) or to "Data" missed now (Two first paragraphs of "qFit improves overall fit...", page 8, and "Generating the qFit test set", page 22, and "Generating synthetic data ..." at page 26; description of the test data set), At my personal taste, description of tests with simulated data (page 15) would be better before that of tests with real data.

      Thank you for this comment, but we stand by our original decision to keep the general flow of the paper as it was submitted.

      (3) I wonder if the term "quadratic programming" (e.g., A3, page 5) is appropriate. It supposes optimization of a quadratic function of the independent parameters and not of "some" parameters. This is like the crystallographic LS which is not a quadratic function of atomic coordinates, and I think this is a similar case here. Whatever the answer on this remark is, an example of the function and its parameters is certainly missed.

      We think that the term quadratic programming is appropriate. We fit a function with a loss function (observed density - calculated density), while satisfying the independent parameters. We fit the coefficients minimizing a quadratic loss. We agree that the quadratic function is missing from the paper, and we have now included it in the Methods section.

      Technical remarks to be answered by the authors :

      (1) Page 1, Abstract, line 3. The ensemble modeling is not the only existing frontier, and saying "one of the frontiers" may be better. Also, this phrase gives a confusing impression that the authors aim to predict the ensemble models while they do it with experimental data.

      We agree with this statement and have re-worded the abstract to reflect this.

      (2) Page 2. Burling & Brunger (1994) should be cited as predecessors. On the contrary, an excellent paper by Pearce & Gros (2021) is not relevant here.

      While we agree that we should mention the Burling & Brunger paper and the Pearce & Gros (2021) should not be removed as it is not discussing the method of ensemble refinement.

      (3) Page 2, bottom. "Further, when compared to ..." The preference to such approach sounds too much affirmative.

      We have amended this sentence to state:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot(Emsley et al. 2010) unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      “The point we were trying to make in this sentence was that ensemble-based models are much harder to manually manipulate in Coot or other similar software compared to multiconformer models. We think that the new version of this sentence states this point more clearly.”

      (4) Page 2, last paragraph. I do not see an obvious relation of references 15-17 to the phrase they are associated with.

      We disagree with this statement, and think that these references are appropriate.

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      (5) Page 3, paragraph 2. Cryo-EM maps should be also "high-resolution"; it does not read like this from the phrase.

      We agree that high-resolution should be added, and the sentence now states:

      “However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in high-resolution cryo-EM.”

      (6) Page 3, last paragraph before "results". The words "... in both individual cases and large structural bioinformatic projects" do not have much meaning, except introducing a self-reference. Also, repeating "better than 2 A" looks not necessary.

      We agree that this was unnecessary and have simplified the last sentence to state:

      “With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models to derive ensemble-function insights.”

      (7) Page 3. "Results". Could "experimental" be replaced by a synonym, like "trial", to avoid confusing with the meaning "using experimental data"?

      We have replaced experimental with exploratory to describe the use of qFit on CryoEM data. The statement now reads:

      “For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more exploratory.”

      (8) Page 4, A.1. Should it be "steps +/- 0.1" and "coordinate" be "coordinate axis"? One can modify coordinates and not shift them. I do not understand how, with the given steps, the authors calculated the number of combinations ("from 9 to 81"). Could a long "Alternatively, ...absent" be reduced simply to "Otherwise"?

      We have simplified and clarified the sentence on the sampling of backbone coordinates to state:

      “If anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate axis, extending to 0.3 Å, resulting in 9 (if isotropic) or to 81 (if anisotropic) distinct backbone conformations for further analysis.”

      (9) Page 6, B.1, line 2. Word "linearly" is meaningless here.

      We have modified this to read:

      “Moving from N- to C- terminus along the protein,”

      (10) Page 9, line 2. It should be explained which data set is considered as the test set to calculate Rfree.

      We think this is clear and would be repetitive if we duplicated it.

      (11) Page 9, line 7. It should be "a valuable metric" and not "an"

      We agree and have updated the sentence to read:

      “Rfree is a valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling.”

      (12) Page 10, paragraph 3. "... as a string (Methods)". I did not find any other mention of this term "string", including in "Methods" where it supposed to be explained. Either this should be explained (and an example is given?), or be avoided.

      We agree that string is not necessary (discussing the programmatic datatype). We have removed this from the sentence. It now reads:

      “To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer (Methods).”

      (13) Page10, lines 3-4 from bottom. Are these two alternative conformations justified?

      We are unsure what this is referring to.

      (14) Page 12, Fig. 2A. In comparison with Supplement Fig 2C, the direction of axes is changed. Could they be similar in both Figures?

      We have updated Supplementary Figure 2C to have the same direction of axes as Figure 2A.

      (15) Page 15, section's title. Choose a single verb in "demonstrate indicate".

      We have amended the title of this section to be:

      “Simulated data demonstrate qFit is appropriate for high-resolution data.”

      (16) Page 15, paragraph 2. "Structure factors from 0.8 to 3.0 A resolution" does not mean what the author wanted apparently to tell: "(complete?) data sets with the high-resolution limit which varied from 0.8 to 3.0 A ...". Also, a phrase of "random noise increasing" is not illustrated by Figs.5 as it is referred to.

      We have edited this sentence to now read:

      “To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors with a high resolution limit ranging from  0.8 to 3.0 Å resolution (in increments of 0.1 Å).”

      (17) Page 15, last paragraph is written in a rather formal and confusing way while a clearer description is given in the figure legend and repeated once more in Methods. I would suggest to remove this paragraph.

      We agree that this is confusing. Instead of create a true positive/false positive/true negative/false negative matrix, we have just called things as they are, multiconformer or single conformer and match or no match. We have edited the language the in the manuscript and figure legends to reflect these changes.

      (18) Page 16. Last two paragraphs start talking about a new story and it would help to separate them somehow from the previous ones (sub-title?).

      We agree that this could use a subtitle. We have included the following subtitle above this section:

      “Simulated multiconformer data illustrate the convergence of qFit.”

      (19) Page 20. "or static" and "we determined that" seem to be not necessary.

      We have removed static and only used single conformer models. However, as one of the main conclusions of this paper is determining that qFit can pick up on alternative conformers that were modeled manually, we have decided to the keep the “we determined that”.

      (20) Page 21, first paragraph. "Data" are plural; it should be "show" and "require"

      We have made these edits. The sentence now reads:

      “However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.”

      (21) Page 21, References should be indicated as [41-45], [35,46-48], [55-57]. A similar remark to [58-63] at page 22.

      We have fixed the reference layout to reflect this change.

      (22) Page 21, last paragraph. "Further reduce R-factors" (moreover repeated twice) is not correct neither by "further", since here it is rather marginal, nor as a goal; the variations of R-factors are not much significant. A more general statement like "improving fit to experimental data" (keeping in mind density maps) may be safer.

      We agree with the duplicative nature of these statements. We have amended the sentence to now read:

      “Automated detection and refinement of partial-occupancy waters should help improve fit to experimental data further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.”

      (23) Page 22. Sub-sections of "Methods" are given in a little bit random order; "Parallelization of large maps" in the middle of the text is an example. Put them in a better order may help.

      We have moved some section of the Methods around and made better headings by using an underscore to highlight the subsections (Generating and running the qFit test set, qFit improved features, Analysis metrics, Generating synthetic data for resolution dependence).

      (24) Page 24. Non-convex solution is a strange term. There exist non-convex problems and functions and not solutions.

      We agree and we have changed the language to reflect that we present the algorithm with non-convex problems which it cannot solve.

      (25) Page 26, "Metrics". It is worthy to describe explicitly the metrics and not (only) the references to the scripts.

      For all metrics, we describe a sentence or two on what each metric describes. As these metrics are well known in the structural biology field, we do not feel that we need to elaborate on them more.

      (26) Page 26. Multiplying B by occupancy does not have much sense. A better option would be to refer to the density value in the atomic center as occ*(4*pi/B)^1.5 which gives a relation between these two entities.

      We agree and have update the B-factor figures and metrics to reflect this.

      (27) Page 40, suppl. Fig. 5. Due to the color choice, it is difficult to distinguish the green and blue curves in the diagram.

      We have amended this with the colors of the curves have been switched.

      (28) Page 42, Suppl. Fig. 7. (A) How the width of shaded regions is defined? (B) What the blue regions stand for? Input Rfree range goes up to 0.26 and not to 0.25; there is a point at the right bound. (C) Bounds for the "orange" occupancy are inversed in the legend.

      (A) The width of the shaded region denotes the standard deviations among the values at every resolution. We have made this clearer in the caption

      (B) The blue region denotes the confidence interval for the regression estimate. Size of the confidence interval was set to 95%. We have made this clearer in the caption

      (C) This has been fixed now

      The maximum R-free value is 0.2543, which we rounded down to 0.25.

      (29) Page 43. Letters E-H in the legend are erroneously substituted by B-E.

      We apologize for this mistake. It is now corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Some important and interesting data are missing. For example, whether the gene therapy can extend the life span of these mutants? The overall in vivo voiding function is missing. AAV9/HSPE2 expression in the bladder wall is not shown.

      Our study was not designed to determine whether gene therapy can improve life span of the Hpse2 mutant mice. We know that the mutant mice usually become ill after the first month of life and can die. However, we wanted to study the mice when they were generally well so that there would be no confounding effects on the bladder physiology caused by general ill health. Indeed, a recent study of Hpse2 inducible deletion in adult mice has shown evidence of exocrine pancreatic insufficiency (Kayal et al., PMID 37491420). We are currently exploring the status of the pancreas in our non-conditional juvenile Hpse2 mice, and whether gene transfer into the pancreas is possible.

      We strongly agree that in vivo voiding studies will be important in the future, and suggest in vivo cystometry is the gold standard for this but is currently beyond the remit of this study.

      It is correct that in this paper we focussed on gene transduction into the pelvic ganglia, because the evidence is mounting that this is a neurogenic disease, with our ex vivo physiological studies showing predominantly neurogenic defects that are corrected by the gene therapy. To further understand the biodistribution of the vector we have now sought evidence of viral transduction into the bladder itself (the new Figure 5). In contrast to the neurons of the pelvic ganglia, we observed very limited transduction: “The vector genome sequence WPRE3, and HPSE2 transcripts, were not detected in the urothelium or lamina propria, the loose tissue directly underneath the urothelium. Within the detrusor muscle layer itself, the large smooth muscle cells were not transduced. However, there were rare small foci of BaseScopeTM signal that may represent nerves coursing through the detrusor.”

      Reviewer 2:

      Weaknesses include a lack of discussion of the basis for differences in carbachol sensitivity in Hpse2 mutant mice, limited discussion of bladder tissue morphology in Hpse2 mutant mice, some questions over the variability of the functional data, and a need for clarification on the presentation of statistical significance of functional data

      Yes, it is interesting that untreated male mutant mice have an increased bladder body contraction to carbachol compared with WT males. In a previous paper (Manak et al., 2020) we performed quantitative western blots for the M2 and M3 receptors and found levels were similar in mutants to the WTs, thus the increased sensitivity probably lies post-receptor.

      A detailed study of the bladder body is an interesting idea, in terms of possible transgene expression and detailed histology, and is something we will pursue in future studies.

      We have reported in our physiology graphs what we find. We do find some variability, particularly at lower frequencies, but our conclusions depend on analyses of the whole curve, which depend on multiple frequencies and show the expected overall pattern of frequency-dependent relaxation.

      Thank you, the stats for Figure 8 (now figure 9) have been corrected.

      Reviewer 3:

      Single-cell analysis of mutants versus control bladder, urethra including sphincter. This would be great also for the community.

      Yes, in future we are very interested in using a single cell sequencing approach to look at the mutant, WT and rescued pelvic ganglia. In the manuscript we have provided further discussion on the aetiology of urofacial syndrome, and what we still have to learn. We highlight a recent paper in eLife that uses single cell sequencing of mouse pelvic ganglia (Sivori et al., 2024), demonstrating the feasibility of this molecular approach in the pelvic ganglia, and propose this technique could be applied to the study the UFS mice to provide important insights into the molecular pathobiology of the condition.

      Detailed tables showing data from each mouse examined.

      In theory, it would be very interesting to correlate the strength of human gene transduction into the pelvic ganglia, with, for example, the effect on a physiological parameter. However, in general we used different sets of mice for these techniques so at the present we don’t have this information.

      Use of measurements that are done in vivo (spot assay for example). This sounds relatively simple.

      We strongly agree that in vivo voiding studies will be important it the future, and suggest in vivo cystometry is the gold standard for this but is currently beyond the remit of this study.

      Assessment of viral integration in tissues besides the liver (could be done by QPCR).

      This is an important point, and suggest the pancreas is a particularly interesting target for future studies. In the manuscript, we have highlighted a recent study of Hpse2 inducible deletion in young adult mice that has shown evidence of exocrine pancreatic insufficiency (Kayal et al., PMID 37491420), associated with fatty degeneration of pancreatic acinar cells. The Hpse2 mutant animals are smaller than wildtype littermates, the reason for which has not been identified but could be due to defects in processing milk and food.  We are currently exploring the status of the pancreas in our non-conditional juvenile Hpse2 mice, and whether gene transfer into the pancreas is possible.

      Discuss subtypes of neurons that are present and targeted in the context of mutants and controls.

      The make-up of the pelvic ganglia in Hpse2 mutant mice is a fascinating question. Future analysis using scRNA-Seq may be the most effective way to answer this question and is a molecular approach we are looking to pursue in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study develops a machine learning method to reveal hidden unknown functions and behavior in gene regulatory networks by searching parameter space in an efficient way. The evidence for some parts of the paper is still incomplete and needs systematic comparison to other methods and to the ground truth, but the work will be of broad interest to anyone working in biology of all stripes since the ideas reach beyond gene regulatory networks to revealing hidden functions in any complex system with many interacting parts.

      We thank the editors and reviewers for their positive assessment and constructive suggestions. In our response, we acknowledge the importance of systematic comparison to other methods and to the ground truth, when available. However we also emphasize the challenges associated with evaluating such methods in the context of uncovering hidden behaviors in complex biological networks as the ground truth is often unknown.  We hope that our explanations will clarify the potential of our approach in advancing the exploration of these systems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters  is large and their uncertain range  is not negligible». For the considered models, the true steady-state goal set is unknown, which is why we chose comparison with random search. We added a “Statistics” subsection in the Methods section providing additional details about the statistical analyses we perform between our method and the random search baseline.

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted ) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy  at the start which could be called during the GRN’s trajectory to sample control actions  where  would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56].

      While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10 in RKIPP_RP levels and ~300 in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally  in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a. Where is 'effective intervention' used in the method?

      b. in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and we have replaced it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we have clarified those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We have replaced the verb “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations  on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize  this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives.  Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in the revised version in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results.  We have updated the figure 9 image and caption, as well as descriptive text, to include these novel results in the revised version. We also added a reference to the CMA-ES paper in the citations.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest to conduct a more rigor analysis of the performance by estimating/approximating the ground truth robust goal sets in important GRNs.

      Also, the use of terminology from different disciplines can be improved. Please see my comments above. Specifically, the connection between controllability in dynamical control systems and versatility used in this paper is unclear.

      We hope to have addressed the reviewer's concerns in our previous answers.

      Reviewer #2 (Recommendations For The Authors):

      Fig 4b: I'm not sure if DBSCAN is the appropriate method to use here, as the visual focus on the core elements of the clusters downplays the full convex hull of the points that random sampling achieves in Z space. An analysis based on convex hulls or the ball-coverage from Fig. 3b would presumably generate plots that were more similar between random sampling and curiosity search. If the goal is to highlight redundancy/non-linearity in the mapping between Z and I, another approach might be to simply bin Z-space in a grid, or to use a clustering algorithm that is less stringent about core/noise distinctions.

      We thank the reviewer for the suggestion. This plot is intended to convey the reader an understanding of why a method that uniformly samples goals in Z (what the  IMGEP is doing), is more efficient than a method that uniformly samples parameters in I (what the random search is doing), in systems for which there is high redundancy/non-linearity in the mapping between I and Z. We agree that binning the Z-space in a grid and counting the number of achieved bins is a way to quantitatively measure this, which is by the way very close to what we do in Figure 3 for measuring the achieved diversity. We believe however that the clustering and coloring provides additional intuitions on why this is the case: it illustrates that large regions of the intervention space map to small regions in the outcome space and vice versa.

      Additional changes in the revised version:

      We added a sentence in the Methods section as well as in the caption of Table S1 providing additional details about the way we simulate the biological models from the BioModels website

      We fixed a wrong reference to Figure 4 in the Methods “Sensitivity measure” subsection with reference to Figure 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Despite the importance of long-lived plasma cells (LLPCs), particularly in the vaccination field, their natures are still unclear. In this valuable manuscript, as a first step towards clarifying these natures, the authors used a solid genetic approach (time-stamping one) and successfully labelled only functional LLPCs. Although four groups have already published data by the same genetic approach, the authors' manuscript includes additional significant findings in the LLPC field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The mechanisms underlying the generation and maintenance of LLPCs have been one of the unresolved issues. Recently, four groups have independently generated new genetic tools that allow fate tracing of murine plasma cells and have addressed how LLPCs are generated or maintained in homeostatic conditions or upon antigen immunization or viral infection. Here, Jing et al. have established another, but essentially the same, PC time stamping system, and tried to address the issues above. The question is whether the findings reported here provide significant conceptual advances from what has already been published.

      (1) Some of the observations in this manuscript have already been made by other studies (Xu et al. 2020, Robinson et al. 2022, Liu et al. 2022, Koike et al. 2023, Robinson et al. 2023). In my opinion, however, genetic analysis of the role of CXCR4 on PC localization or survival in BM (Figure 5) was well performed and provided some new aspects which have not been addressed in previous reports. The motility of CXCR4 cKO plasma cells in BM is not shown, but it could further support the idea that reduced mobility or increased clustering is required for longevity.

      (2) The combination of the several surface markers shown in Figure 3&4 doesn't seem to be practically applicable to identify or gate on LLPCs, because differential expression of CD81, CXCR4, CD326, CD44, or CD48 on LLPCs vs bulk PCs was very modest. EpCAMhi/CXCR3-, Ly6Ahi/Tigit- (Liu et al. 2022), B220lo/MHC-IIlo (Koike et al. 2023), or SLAMF6lo/MHC-IIlo (Robinson et al. 2023) has been reported as markers for LLPC population. It is unclear that the combination of surface markers presented here is superior to published markers. In addition, it is unclear why the authors did not use their own gene expression data (Fig.6), instead of using public datasets, for picking up candidate markers.

      In terms of the utility of these markers, we agree they are not sufficient to distinguish bona fide LLPCs but they did enrich for LLPCs by 6-fold (Figure 3).  In the other studies cited, LLPCs are enriched in those gates but not exclusively found in the gates, suggesting some plasticity.  In terms of how they were chosen, we conducted the flow surface studies in parallel and prior to completing the gene expression studies, thus, they were not available in time to be useful for the longitudinal studies.  As this was not the major findings of the paper, we have reduced emphasis on this section, and moved some of the data to Figure S2.

      Reviewer #2 (Public Review):

      In this study by Jing, Fooksman, and colleagues, a Blimp1-CreERT2-based genetic tracing study is employed to label plasma cells. Over the course of several months post-tamoxifen treatment, the only remaining labeled cells are long-lived plasma cells. This system provides a way to sort live long-lived plasma cells and compare them to unlabeled plasma cells, which contain a range of short-to-long-lived cells. From this analysis, several observations are made: 1) the turnover rate of plasma cells is greater in the spleen than in the bone marrow; 2) the turnover rate is highest early in life; 3) subtle transcriptional and cell surface marker differences distinguish long- from shorter-lived plasma cells; 4) long-lived plasma cells in the bone marrow are sessile and localize in clusters with each other; 5) CXCR4 is required for plasma cell retention in these clusters and in the bone marrow; 6) Repertoire analysis hints that the selection of long-lived plasma cells is not random for any cell that lands in the bone marrow.

      Strengths:

      (1) The genetic timestamping approach is a clever and functional way to separate plasma cells of differing longevities.

      (2) This approach led to the identification of several markers that could help prospective separation of long-lived plasma cells from others.

      (3) Functional labeling of long-lived plasma cells allowed for a higher resolution analysis of transcriptomes and motility than was previously possible.

      (4) The genetic system allowed for a revisitation of the importance of CXCR4 in plasma cell retention and survival.

      Weaknesses:

      (1) Most of the labeling studies, likely for practical reasons, were done on polyclonal rather than antigen-specific plasma cells. The triggers of these responses could vary based on age at the time of exposure, anatomical sites, etc. How these differences might influence markers and transcriptomes, independently of longevity, is not completely known.

      (2) The fraction of long-lived plasma cells in the unlabeled fraction varies with age, potentially diluting differences between long- and short-lived plasma cells.

      (3) The authors suggest their data favors a model by which plasma cells compete for niche space. Yet there is no evidence presented here that these niches are limiting.

      In Figure 2, we provide important evidence that LLPCs are enriched in PC clusters, and are less motile, suggesting they occupy a unique niche compared to bulk PCs in the bone marrow.  But we agree it does not clarify if that niche is limited.

      (4) The functional importance of the observed transcriptome differences between long- and shorter-lived plasma cells is unknown. An assessment as to whether these differences are conserved in human long- and short-lived bone marrow plasma cells might provide circumstantial supporting evidence that these changes are important for longevity.

      Reviewer #3 (Public Review):

      The valuable work shows some unique characteristics of long-lived PCs in comparison with bulk PCs. In particular, the authors clearly indicated the dependency of CXCR4 in PC longevity and provided a deal of resource of PC transcriptomes. Though CD93 is known as a marker for long-lived PCs, the authors can provide more data related to CD93.

      Summary:

      Long-lived PCs are maintained with low motility and in a CXCR4-dependent manner. 

      Strengths:

      The reporter mice for fate-mapping can clearly distinguish long-lived PCs from total PCs and greatly contribute to the identification of long-lived PCs.

      Weaknesses:

      The authors are unable to find a unique marker for long-lived PCs

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Given the author's expertise, I suggest investigating the motility of CXCR4 cKO plasma cells in BM. 

      Thank you for the suggestion. This work would certainly fit in with the theme of the paper.  We tried to measure this using the BEC Rosa-LSL-YFP Cxcr4f/f system after tamoxifen treatment but unfortunately, these PCs leave the BM concurrently as they turn on YFP expression from the Rosa26 locus, making it impossible to capture the change in motility.  This is also evident in our data in updated Figure 5 which shows that intratibial injection of 4HO-Tamoxifen causes rapid mobilization of CXCR4KO PCs from the tibia within 1 day.  We tried to breed other models that would allow us to visualize these early events, which were unsuccessful, and also responsible for the long delay in resubmission.

      (2) Expression of CD81, CXCR4, CD326, CD44, or CD48 was not different enough to distinguish LLPCs from bulk PCs (Figure 3B). The caveat is that bulk PCs also contained a significant frequency of LLPCs, which would make the difference in expression levels smaller. I suggest looking at the expression of these molecules on newly generated PCs, soon after protein immunization, for example.

      This would be a separate issue, when they begin to express the LLPC phenotype, and definitely worthwhile in future studies.

      Reviewer #2 (Recommendations For The Authors):

      (1) Related to the above public comment #4, I would recommend looking at Halliley et al., Immunity, 2015 to see if some of the same LLPC transcriptional and marker differences can be observed between CD19+ and CD19- plasma cells in the human marrow.

      Thank you for the suggestion to do a human correlation.  It is unclear what conclusions we can draw from overlapping or non-overlapping patterns, on their own.

      (2) For CD93, since it is bimodal, it may be better to express this as % positive rather than fold changes in MFI as in Figure 3.

      We have updated Figure 3C to include %positive as suggested. Fold changes were moved to Figure S2.

      Reviewer #3 (Recommendations For The Authors):

      The valuable work shows some unique characteristics of long-lived PCs in comparison with bulk PCs. In particular, the authors clearly indicated the dependency of CXCR4 in PC longevity and provided a deal of resources of PC transcriptomes. Though CD93 is known as a marker for long-lived PCs, the authors can provide more data related to CD93.

      Major points:

      The authors show data that some bulk PCs express CD93 lower. Are CD93low bulk PCs are higher motile in the BM compared to CD93high? Are CD93low highly mutated in the Ig gene? Do CD93high bulk PCs have similar transcriptome to long-lived PCs on some representative genes?

      Although we do not have data here, the difference between CD93high cells and CD93low cells are likely to be small since labeled PCs were observed to express higher CD93 surface level as early as day 5 in BM and SP shown in updated Figure 3C. Thus, while CD93 is strongly enriched in LLPCs, it cannot be used as a single marker to sufficiently isolate LLPCs, which would make it very difficult to detect changes in motility, mutation of Ig gene, and gene expression.

      Minor points:

      (1) In the title, the authors describe that surface receptor expression support PC-intrinsic longevity. The surface receptor is only CXCR4. The ambiguous description confuses the readers. 

      While CXCR4 was shown functionally to be involved, we found multiple surface receptors are differentially expressed in LLPCs.

      (2) The abbreviations of 'bone marrow' and 'BM' should be unified.

      (3) In Fig. 7C, the bars for comparison are unclear. What dots are compared? 

      Bars are comparing day 90 middle aged to day 5 controls, as there were only n=2 for some day 90 young mice samples for all internally pared comparisons.

      (4) The explanation about Fig.7I can't be understood. How are conclusions occurred from the panel? 

      Fig. 7I shows that of the most common public clones found (found in the most samples or mice), across all LLPC and Bulk 42 total samples, most of the hits came from LLPC samples (all colored) whereas few were from bulk PC samples (white bars), suggesting the shared repertoire is uniquely LLPC-like.  These were observations drawn, but no statistical analysis was conducted here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study makes a valuable empirical contribution to our understanding of visual processing in primates and deep neural networks, with a specific focus on the concept of factorization. The analyses provide solid evidence that high factorization scores are correlated with neural predictivity, yet more evidence would be needed to show that neural responses show factorization. Consequently, while several aspects require further clarification, in its current form this work is interesting to systems neuroscientists studying vision and could inspire further research that ultimately may lead to better models of or a better understanding of the brain.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates visual processing in primates and deep neural networks (DNNs), focusing on factorization in the encoding of scene parameters. It challenges the conventional view that object classification is the primary function of the ventral visual stream, suggesting instead that the visual system employs a nuanced strategy involving both factorization and invariance. The study also presents empirical findings suggesting a correlation between high factorization scores and good neural predictivity.

      Strengths:

      (1) Novel Perspective: The paper introduces a fresh viewpoint on visual processing by emphasizing the factorization of non-class information.

      (2) Methodology: The use of diverse datasets from primates and humans, alongside various computational models, strengthens the validity of the findings.

      (3) Detailed Analysis: The paper suggests metrics for factorization and invariance, contributing to a future understanding & measurements of these concepts.

      Weaknesses:

      (1) Vagueness (Perceptual or Neural Invariance?): The paper uses the term 'invariance', typically referring to perceptual stability despite stimulus variability [1], as the complete discarding of nuisance information in neural activity. This oversimplification overlooks the nuanced distinction between perceptual invariance (e.g., invariant object recognition) and neural invariance (e.g., no change in neural activity). It seems that by 'invariance' the authors mean 'neural' invariance (rather than 'perceptual' invariance) in this paper, which is vague. The paper could benefit from changing what is called 'invariance' in the paper to 'neural invariance' and distinguish it from 'perceptual invariance,' to avoid potential confusion for future readers. The assignment of 'compact' representation to 'invariance' in Figure 1A is misleading (although it can be addressed by the clarification on the term invariance). [1] DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in cognitive sciences. 2007 Aug 1;11(8):333-41.

      Thanks for pointing out this ambiguity. In our Introduction we now explicitly clarify that we use “invariance” to refer to neural, rather than perceptual invariance, and we point out that both factorization and (neural) invariance may be useful for obtaining behavioral/perceptual invariance.

      (2) Details on Metrics: The paper's explanation of factorization as encoding variance independently or uncorrelatedly needs more justification and elaboration. The definition of 'factorization' in Figure 1B seems to be potentially misleading, as the metric for factorization in the paper seems to be defined regardless of class information (can be defined within a single class). Does the factorization metric as defined in the paper (orthogonality of different sources of variation) warrant that responses for different object classes are aligned/parallel like in 1B (middle)? More clarification around this point could make the paper much richer and more interesting.

      Our factorization metric measures the degree to which two sets of scene variables are factorized from one another. In the example of Fig. 1B, we apply this definition to the case of factorization of class vs. non-class information. Elsewhere in the paper we measure factorization of several other quantities unrelated to class, specifically camera viewpoint, lighting conditions, background content, and object pose. In our revised manuscript we have clarified the exposition surrounding Fig. 1B to make it clear that factorization, as we define it, can be applied to other quantities as well and that responses do not need to be aligned/parallel but simply live in a different set of dimensions whether linearly or nonlinearly arranged. Thanks for raising the need to clarify this point.

      (3) Factorization vs. Invariance: Is it fair to present invariance vs. factorization as mutually exclusive options in representational hypothesis space? Perhaps a more fair comparison would be factorization vs. object recognition, as it is possible to have different levels of neural variability (or neural invariance) underlying both factorization and object recognition tasks.

      We do not mean to imply that factorization and invariance are mutually exclusive, or that they fully characterize the space of possible representations. However, they are qualitatively distinct strategies for achieving behavioral capabilities like object recognition. In the revised manuscript we also include a comparison to object classification performance (Figures 5C & S4, black x’s) as a predictor of brain-like representations, alongside the results for factorization and invariance.

      In our revised Introduction and beginning of the Results section, we make it more clear that factorization and invariance are not mutually exclusive – indeed, our results show that both factorization and invariance for some scene variables like lighting and background identity are signatures of brain-like representations. Our study focuses on factorization because we believe its importance has not been studied or highlighted to the degree that invariance to “nuisance” parameters has in concert with selectivity to object identity in individual neuron tuning functions. Moreover, the loss functions used for supervised training functions of neural networks for image classification would seem to encourage invariance as a representational strategy. Thus, the finding that factorization of scene parameters is an equally good if not better predictor of brain-like representations may motivate new objective functions for neural network training.

      (4) Potential Confounding Factors in Empirical Findings: The correlation observed in Figure 3 between factorization and neural predictivity might be influenced by data dimensionality, rather than factorization per se [2]. Incorporating discussions around this recent finding could strengthen the paper.

      [2] Elmoznino E, Bonner MF. High-performing neural network models of the visual cortex benefit from high latent dimensionality. bioRxiv. 2022 Jul 13:2022-07.

      We thank the Reviewer for pointing out this important, potential confound and the need for a direct quantification. We have now included an analysis computing how well dimensionality (measured using the participation ratio metric for natural images, as was done in [2] Elmoznino& Bonner bioRxiv. 2022) can account for model goodness-of-fit (additional pink bars in Figure 6). Factorization of scene parameters appears to add more predictive power than dimensionality on average (Figure 6, light shaded bars), and critically, factorization+classification jointly predict goodness-of-fit significantly better than dimensionality+classification for V4 and IT/HVC brain areas (Figure 6, dark shaded bars). Indeed, dimensionality+classification is only slightly more predictive than classification alone for V4 and IT/HVC indicating some redundancy in those measures with respect to neural predictivity of models (Figure 6, compare dark shaded pink bar to dashed line).

      That said, high-dimensional representations can, in principle, better support factorization, and thus we do not regard these two representational strategies necessarily in competition. Rather, our results suggest (consistent with [2]) that dimensionality is predictive of brain-like representation to some degree, such that some (but not all) of factorization’s predictive power may indeed owe to a partial correlation with dimensionality. We elaborate in the Discussion where this point comes up and now refer to the updated Figure 6 that shows the control for dimensionality.

      Conclusion:

      The paper offers insightful empirical research with useful implications for understanding visual processing in primates and DNNs. The paper would benefit from a more nuanced discussion of perceptual and neural invariance, as well as a deeper discussion of the coexistence of factorization, recognition, and invariance in neural representation geometry. Additionally, addressing the potential confounding factors in the empirical findings on the correlation between factorization and neural predictivity would strengthen the paper's conclusions.

      Taken together, we hope that the changes described above address the distinction between neural and perceptual invariance, provide a more balanced understanding of the contributions of factorization, invariance, and local representational geometry, and rule against dimensionality for natural images as contributing to the main finding of the benefits from factorization of scene parameters.

      Reviewer #2 (Public Review):

      Summary:

      The dominant paradigm in the past decade for modeling the ventral visual stream's response to images has been to train deep neural networks on object classification tasks and regress neural responses from units of these networks. While object classification performance is correlated to the variance explained in the neural data, this approach has recently hit a plateau of variance explained, beyond which increases in classification performance do not yield improvements in neural predictivity. This suggests that classification performance may not be a sufficient objective for building better models of the ventral stream. Lindsey & Issa study the role of factorization in predicting neural responses to images, where factorization is the degree to which variables such as object pose and lighting are represented independently in orthogonal subspaces. They propose factorization as a candidate objective for breaking through the plateau suffered by models trained only on object classification.

      They claim that (i) maintaining these non-class variables in a factorized manner yields better neural predictivity than ignoring non-class information entirely, and (ii) factorization may be a representational strategy used by the brain.

      The first of these claims is supported by their data. The second claim does not seem well-supported, and the usefulness of their observations is not entirely clear.

      Strengths:

      This paper challenges the dominant approach to modeling neural responses in the ventral stream, which itself is valuable for diversifying the space of ideas.

      This paper uses a wide variety of datasets, spanning multiple brain areas and species. The results are consistent across the datasets, which is a great sign of robustness.

      The paper uses a large set of models from many prior works. This is impressively thorough and rigorous.

      The authors are very transparent, particularly in the supplementary material, showing results on all datasets. This is excellent practice.

      Weaknesses:

      (1) The primary weakness of this paper is a lack of clarity about what exactly is the contribution. I see two main interpretations: (1-A) As introducing a heuristic for predicting neural responses that improve over-classification accuracy, and (1-B) as a model of the brain's representational strategy. These two interpretations are distinct goals, each of which is valuable. However, I don't think the paper in its current form supports either of them very well:

      (1-A) Heuristic for neural predictivity. The claim here is that by optimizing for factorization, we could improve models' neural predictivity to break through the current predictivity plateau. To frame the paper in this way, the key contribution should be a new heuristic that correlates with neural predictivity better than classification accuracy. The paper currently does not do this. The main piece of evidence that factorization may yield a more useful heuristic than classification accuracy alone comes from Figure 5. However, in Figure 5 it seems that factorization along some factors is more useful than others, and different linear combinations of factorization and classification may be best for different data. There is no single heuristic presented and defended. If the authors want to frame this paper as a new heuristic for neural predictivity, I recommend the authors present and defend a specific heuristic that others can use, e.g. [K * factorization_of_pose + classification] for some constant K, and show that (i) this correlates with neural predictivity better than classification alone, and (ii) this can be used to build models with higher neural predictivity. For (ii), they could fine-tune a state-of-the-art model to improve this heuristic and show that doing so achieves a new state-of-the-art neural predictivity. That would be convincing evidence that their contribution is useful.

      Our paper does not make any strong claim regarding the Reviewer’s point 1-A (on heuristics for neural predictivity). In the Discussion, last paragraph, we better specify that our work is merely suggestive of claim 1-A about heuristics for more neurally predictive, more brainlike models. We believe that our paper supports the Reviewer’s point 1-B (on brain representation) as we discuss below.

      We leave it to future work to determine if factorization could help optimize models to be more brainlike. This treatment may require exploration of novel model architectures and loss functions, and potentially also more thorough neural datasets that systematically vary many different forms of visual information for validating any new models.

      (1-B) Model of representation in the brain. The claim here is that factorization is a general principle of representation in the brain. However, neural predictivity is not a suitable metric for this, because (i) neural predictivity allows arbitrary linear decoders, hence is invariant to the orthogonality requirement of factorization, and (ii) neural predictivity does not match the network representation to the brain representation. A better metric is representational dissimilarity matrices. However, the RDM results in Figure S4 actually seem to show that factorization does not do a very good job of predicting neural similarity (though the comparison to classification accuracy is not shown), which suggests that factorization may not be a general principle of the brain. If the authors want to frame the paper in terms of discovering a general principle of the brain, I suggest they use a metric (or suite of metrics) of brain similarity that is sensitive to the desiderata of factorization, e.g. doesn't apply arbitrary linear transformations, and compare to classification accuracy in addition to invariance.

      We agree with the Reviewer about the shortcomings of neural predictivity for comparing representational geometries, and in our revised manuscript we have provided a more comprehensive set of results that includes RDM predictivity in new Figures 6 & 7, alongside the results for neural fit predictivity. In addition, as suggested we added classification accuracy predictivity in Figures 5C & S4 (black x’s) for visual comparison to factorization/invariance. In Figure S4 on RDMs, it is apparent how factorization is at least as good a predictor as classification on all V4 & IT datasets from both monkeys and humans (compared x’s to filled circles in Figure S4; note that some of the points from the original Figure S4 changed as we discovered a bug in the code that specifically affected the RDM analysis for a few of the datasets).

      We find that the newly included RDM analyses in Figures 6 & 7 are consistent with the conclusions of the neural fit regression analyses: that the correlation of factorization metrics with RDM matches are strong, comparable in magnitude to that of classification accuracy (Figure 6, 3rd & 4th columns, compare black dashed line to faded colored bars) and are not fully accounted for by the model’s classification accuracy alone (Figure 6, 3rd & 4th columns, higher unfaded bars for classification combined with factorization, and see corresponding example scatters in Figure 7 middle/bottom rows).

      It is encouraging that the added benefit of factorization for RDM predictivity accounting for classification performance is at least as good as the improvement seen for neural fit predictivity (Figure 6, 1st & 2nd columns for encoding fits versus 3rd & 4th columns for RDM correlations).

      (2) I think the comparison to invariance, which is pervasive throughout the paper, is not very informative. First, it is not surprising that invariance is more weakly correlated with neural predictivity than factorization, because invariant representations lose information compared to factorized representations. Second, there has long been extensive evidence that responses throughout the ventral stream are not invariant to the factors the authors consider, so we already knew that invariance is not a good characterization of ventral stream data.

      While we appreciate the Reviewer’s intuition that highly invariant representations are not strongly supported in the high-level visual cortex, we nevertheless thought it was valuable to put this intuition to a quantitative, detailed test. As a result, we uncovered effects that were not obvious a priori, at least to us – for example, that invariance for some scene parameters (camera view, object pose) is negatively correlated with neural predictions while invariance to others (background, lighting) is positively correlated. Thus, our work exercises the details of invariance for different types of information.

      (3) The formalization of the factorization metric is not particularly elegant, because it relies on computing top K principal components for the other-parameter space, where K is arbitrarily chosen as 10. While the authors do show that in their datasets the results are not very sensitive to K (Figure S5), that is not guaranteed to be the case in general. I suggest the authors try to come up with a formalization that doesn't have arbitrary constants. For example, one possibility that comes to mind is E[delta_a x delta_b], where 'x' is the normalized cross product, delta_a, and delta_b are deltas in representation space induced by perturbations of factors a and b, and the expectation is taken over all base points and deltas. This is just the first thing that comes to mind, and I'm sure the authors can come up with something better. The literature on disentangling metrics in machine learning may be useful for ideas on measuring factorization.

      Thanks to the Reviewer for raising this point. First, we wish to clarify a potential misunderstanding of the factorization metric: the number K of principal components we choose is not an arbitrary constant, but rather calibrated to capture a certain fraction of variance, set to 90% by default in our analyses. While this variance threshold is indeed an arbitrary hyperparameter, it has a more intuitive interpretation than the number of principal components.

      Nonetheless, the Reviewer’s comment did inspire us to consider another metric for factorization that does not depend on any arbitrary parameters. In the revised version, we now include a covariance matrix based metric which simply measures the elementwise correlation of the covariance matrices induced by varying the scene parameter of interest and the covariance matrix induced by varying the other parameters (and then subtracts this quantity from 1).

      Correspondingly, we now present results for both the new covariance based measure and the original PCA based one in Figures 5C, 6, and 7. The main findings remain largely the same when using the covariance based metric, and the covariance based metric (Figure 5C, compare light shaded to dark shaded filled circles; Figure 6, compare top row to bottom row; Figure 7, compare middle rows to bottom rows).

      Ultimately, we believe these two metrics are complementary and somewhat analogous to two metrics commonly used for measuring dimensionality (the number of components needed to explain a certain fraction of the variance, analogous to our original PCA based definition; the participation ratio, analogous to our covariance based definition). We have added the formula for the covariance based factorization metric along with a brief description to the Methods.

      (4) The authors defined the term "factorization" according to their metric. I think introducing this new term is not necessary and can be confusing because the term "factorization" is vague and used by different researchers in different ways. Perhaps a better term is "orthogonality", because that is clear and seems to be what the authors' metric is measuring.

      We agree with the Reviewer that factorization has become an overloaded term. At the same time, we think that in this context, the connotation of the term factorization effectively conveys the notion of separating out different latent sources of variance (factors) such that they can be encoded in orthogonal subspaces.

      To aid clarity, we now mention in the Introduction that factorization defined here is meant to measure orthogonalization of scene factors. Additionally, in the Discussion section, we now go into more detail comparing our metric to others previously used in the literature, including orthogonality, to help put it in context.

      (5) One general weakness of the factorization paradigm is the reliance on a choice of factors. This is a subjective choice and becomes an issue as you scale to more complex images where the choice of factors is not obvious. While this choice of factors cannot be avoided, I suggest the authors add two things: First, an analysis of how sensitive the results are to the choice of factors (e.g. transform the basis set of factors and re-run the metric); second, include some discussion about how factors may be chosen in general (e.g. based on temporal statistics of the world, independent components analysis, or something else).

      The Reviewer raises a very reasonable point about the limitation of this work. While we limited our analysis to generative scene factors that we know about and that could be manipulated, there are many potential factors to consider. It is not clear to us exactly how to implement the Reviewer’s suggestion of transforming the basis set of factors, as the factors we consider are highly nonlinear in the input space. Ultimately, we believe that finding unsupervised methods to characterize the “true” set of factors that is most useful for understanding visual representations is an important subject for future work, but outside the scope of this particular study. We have added a comment to this effect in the Discussion.

      Reviewer #3 (Public Review):

      Summary:

      Object classification serves as a vital normative principle in both the study of the primate ventral visual stream and deep learning. Different models exhibit varying classification performances and organize information differently. Consequently, a thriving research area in computational neuroscience involves identifying meaningful properties of neural representations that act as bridges connecting performance and neural implementation. In the work of Lindsey and Issa, the concept of factorization is explored, which has strong connections with emerging concepts like disentanglement [1,2,3] and abstraction [4,5]. Their primary contributions encompass two facets: (1) The proposition of a straightforward method for quantifying the degree of factorization in visual representations. (2) A comprehensive examination of this quantification through correlation analysis across deep learning models.

      To elaborate, their methodology, inspired by prior studies [6], employs visual inputs featuring a foreground object superimposed onto natural backgrounds. Four types of scene variables, such as object pose, are manipulated to induce variations. To assess the level of factorization within a model, they systematically alter one of the scene variables of interest and estimate the proportion of encoding variances attributable to the parameter under consideration.

      The central assertion of this research is that factorization represents a normative principle governing biological visual representation. The authors substantiate this claim by demonstrating an increase in factorization from macaque V4 to IT, supported by evidence from correlated analyses revealing a positive correlation between factorization and decoding performance. Furthermore, they advocate for the inclusion of factorization as part of the objective function for training artificial neural networks. To validate this proposal, the authors systematically conduct correlation analyses across a wide spectrum of deep neural networks and datasets sourced from human and monkey subjects. Specifically, their findings indicate that the degree of factorization in a deep model positively correlates with its predictability concerning neural data (i.e., goodness of fit).

      Strengths:

      The primary strength of this paper is the authors' efforts in systematically conducting analysis across different organisms and recording methods. Also, the definition of factorization is simple and intuitive to understand.

      Weaknesses:

      This work exhibits two primary weaknesses that warrant attention: (i) the definition of factorization and its comparison to previous, relevant definitions, and (ii) the chosen analysis method.

      Firstly, the definition of factorization presented in this paper is founded upon the variances of representations under different stimuli variations. However, this definition can be seen as a structural assumption rather than capturing the effective geometric properties pertinent to computation. More precisely, the definition here is primarily statistical in nature, whereas previous methodologies incorporate computational aspects such as deviation from ideal regressors [1], symmetry transformations [3], generalization [5], among others. It would greatly enhance the paper's depth and clarity if the authors devoted a section to comparing their approach with previous methodologies [1,2,3,4,5], elucidating any novel insights and advantages stemming from this new definition.

      [1] Eastwood, Cian, and Christopher KI Williams. "A framework for the quantitative evaluation of disentangled representations." International conference on learning representations. 2018.

      [2] Kim, Hyunjik, and Andriy Mnih. "Disentangling by factorising." International Conference on Machine Learning. PMLR, 2018.

      [3] Higgins, Irina, et al. "Towards a definition of disentangled representations." arXiv preprint arXiv:1812.02230 (2018).

      [4] Bernardi, Silvia, et al. "The geometry of abstraction in the hippocampus and prefrontal cortex." Cell 183.4 (2020): 954-967.

      [5] Johnston, W. Jeffrey, and Stefano Fusi. "Abstract representations emerge naturally in neural networks trained to perform multiple tasks." Nature Communications 14.1 (2023): 1040.

      Thanks to the Reviewer for this suggestion. We agree that our initial submission did not sufficiently contextualize our definition of factorization with respect to other related notions in the literature. We have added additional discussion of these points to the Discussion section in the revised manuscript and have included therein the citations provided by the Reviewer (please see the third paragraph of Discussion).

      Secondly, in order to establish a meaningful connection between factorization and computation, the authors rely on a straightforward synthetic model (Figure 1c) and employ multiple correlation analyses to investigate relationships between the degree of factorization, decoding performance, and goodness of fit. Nevertheless, the results derived from the synthetic model are limited to the low training-sample regime. It remains unclear whether the biological datasets under consideration fall within this low training-sample regime or not.

      We agree that our model in Figure 1C is very simple and does not fully capture the complex interactions between task performance and features of representational geometry, like factorization. We intend it only as a proof of concept to illustrate how factorized representations can be beneficial for some downstream task use cases. While the benefits of factorized representations disappear for large numbers of samples in this simulation, we believe this is primarily a consequence of the simplicity and low dimensionality of the simulation. Real-world visual information is complex and high-dimensional, and as such the relevant sample size regime in which factorization offers tasks benefits may be much greater. As a first step toward this real-world setting, Figure 2 shows how decreasing the amount of factorization in neural population data in macaque V4/IT can have an effect on object identity decoding.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Missing citations: The paper could benefit from discussions & references to related papers, such as:

      Higgins I, Chang L, Langston V, Hassabis D, Summerfield C, Tsao D, Botvinick M. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature communications. 2021 Nov 9;12(1):6456.

      We have added additional discussion of related work, including the suggested reference and others on disentanglement, to the Discussion section in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Here are several small recommendations for the authors, all much more minor than those in the public review:

      I suggest more use of equations in methods sections about Figure 1C and macaque neural data analysis.

      Thanks for this suggestion. We have added new Equation 1 for the method transforming neural data to reduce factorization of a variable while preserving other firing rate statistics.

      In Figure 1-C, the methods indicate that Gaussian noise was added. This is a very important detail, and complexifies the interpretation of the figure because it adds an assumption about the structure of noise. In other words, if I understand correctly, the correct interpretation of Figure 1C is "assuming i.i.d. noise, decoding accuracy improves with factorization." The i.i.d. noise is a big assumption, and it is debated how well the brain satisfies this assumption. I suggest you either omit noise for this figure or clearly state in the main text (e.g. caption) that the figure must be interpreted under an i.i.d. noise assumption.

      We have added an explicit statement of the i.i.d. noise assumption to the Figure 1C legend.

      For Figure 2B, I suggest labeling the x-axis clearly below the axis on both panels. Currently, it is difficult to read, particularly in print.

      We have made the x-axis labels more clear and included on both panels.

      Figure 3A is difficult to read because of the very small task. I suggest avoiding such small fonts.

      We agree that Figure 3A is difficult to read. We have broken out Figure 3 into two new Figures 3 & 4 to increase clarity and sizing of text in Figure 3A.

      Reviewer #3 (Recommendations For The Authors):

      To strengthen this work, it is advisable to incorporate more comprehensive comparisons with previous research, particularly within the machine learning (ML) community. For instance, it would be beneficial to explore and reference works focusing on disentanglement [1,2,3]. This would provide valuable context and facilitate a more robust understanding of the contributions and novel insights presented in the current study.

      We have added additional discussion of related work and other notions similar to factorization to the Discussion section in the revised manuscript.

      Additionally, improving the quality of the figures is crucial to enhance the clarity of the findings:

      • Figure 2: The caption of subfigure B could be revised for greater clarity.

      Thank you, we have substantially clarified this figure caption.

      • Figure 3: Consider a more equitable approach for computing the correlation coefficient, such as calculating it separately for different types of models. In the case of supervised models, it appears that the correlation between invariance and goodness of fit may not be negligible across various scene parameters.

      We appreciate the suggestion, but we are not confident in our ability to conclude much from analyses restricted to particular model classes, given the relatively small N and the fact that the different model classes themselves are an important source of variance in our data.

      • Figure 4: To enhance the interpretability of subfigures A and B, it may be beneficial to include p-values (indicating confidence levels).

      As we supply bootstrapped confidence intervals for our results, which provide at least as much information as p-values, and most of the effects of interest are fairly stark when comparing invariance to factorization, p-values were not needed to support our points. We added a sentence to the legend of new Figure 5 (previously Figure 4) indicating that error bars reflect standard deviations over bootstrap resampling of the models.

      • Figure 5: For subfigure B, it could be advantageous to plot the results solely for factorization, allowing for a clear assessment of whether the high correlation observed in Classification+Factorization arises from the combined effects of both factors or predominantly from factorization alone.

      First, we clarify/note that the scatters solely for factorization that the Reviewer seeks are already presented earlier in the manuscript across all conditions in Figures 4A,B and Figure S2.

      While we could also include these in new Figure 7 (previously Figure 5B) as the Reviewer suggests, we believe it would distract from the message of that figure at the end of the manuscript – which is that factorization is useful as a supplement to classification in predictive matches to neural data. Nonetheless, new Figure 6 (old Figure 5A) provides a summary quantification of the information that the reviewer requests (Fig. 6, faded colored bars reflect the contribution of factorization alone).

    1. Author response:

      Reply to Reviewer #1 (Public Review):

      The post-processing increases number of putative neoantigens. As shown in Author response image 1, this is done through data augmentation or “mutations” of individual amino acids in a sequence by their most similar amino acid in the BLOSUM62 embedding. If most of the mutations result in a positive prediction (which we binarize through a >0.5 score) the sequence changes its prediction.

      Author response image 1.

      Post-processing pipeline to increase the number of putative neoantigens. Sequences can either be predicted using the forward method, for which a raw score is produced, or it can be introduced to a majority-vote prediction of the ensemble prediction of similar protein sequences.

      In this article, we obtain the following candidates after post-processing.

      Author response table 1.

      As mentioned, the prediction column shows a binary label. The full list contained 402 sequences did not include any other sequences that met the majority vote criteria.

      As noted by the reviewer, the Table 3 of our original paper includes the scores of the direct prediction, which has four sequences in common with the post-processing criteria (*Pnp, *Adar, *Lrrc28 and *Nr1h2). * indicates the mutated form of the peptide, i.e neoantigen.

      We selected the top 4 predicted antigens (present both by direct prediction and after post-processing; (*Pnp, *Adar, *Lrrc28 and *Nr1h2) (Wert-Carvajal et al. 2021), but we encountered difficulty in synthesizing, *Nr1h2 (Mutated Nr1h2), and thus it could not be included in the study.

      We also decided to evaluate the immunogenicity of *Wiz, which was identified as a potential TNA only after postprocessing. *Wiz exhibited lower levels of immunogenicity compared to *Pnp, *Adar, and *Lrrc28. However, unlike these, *Wiz is highly expressed in the tumor, and vaccination with *Wiz provided the strongest protection levels. These findings led us to incorporate post-processingg into the NAP-CNB platform.

      We chose *Herc6 as a mutated antigen predicted not to be a TNA over other candidates because its expression in the tumor was similar to that of *Wiz.

      Depending on the experiment we used 4 or 5 animals per group (this will be clarify in the revised version)

      The software used for statistical analysis was GraphPad Prism.

      Reply to Reviewer #2 (Public Review):

      This is true, binding affinity does not always predict immune responses but in most cases, high affinity peptides are immunogenic. There are of course other parameters that drive the effective priming of tumor-reactive CD8+ T cells through antigen cross-presentation, but the mechanisms of antigen presentation are yet not completely understood. High affinity peptides are desirable as good candidates in neoantigen-based vaccines.

    1. Author response:

      eLife assessment

      This study presents a valuable finding on sperm flagellum and HTCA stabilization. The evidence supporting the authors' claims is incomplete. The work will be of broad interest to cell and reproductive biologists working on cilium and sperm biology.

      We thank the Editor and the two referees for their time in carefully reviewing our work, and we are grateful for the helpful guidance about how to improve our study. We will supplement the experiments and provide quantitative data guided by the referees’ comments in the revised manuscript. Additionally, we will polish the manuscript and add further context to help readers understand the significance of this work.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, Wu et al. investigated the physiological roles of CCDC113 in sperm flagellum and HTCA stabilization by using CRISPR/Cas knockouts mouse models, co-IP, and single sperm imaging. They find that CCDC113 localizes in the linker region among radial spokes, the nexin-dynein regulatory complex (N-DRC), and doublet microtubules (DMTs) RS, N-DRC, and DMTs and interacts with axoneme-associated proteins CFAP57 and CFAP91, acting as an adaptor protein that facilitates the linkage between RS, N-DRC, and DMTs within the sperm axoneme. They show the disruption of CCDC113 produced spermatozoa with disorganized sperm flagella and CFAP91, DRC2 could not colocalize with DMTs in Ccdc113-/- spermatozoa. Interestingly, the data also indicate that CCDC113 could localize on the HTCA region, and interact with HTCA-associated proteins. The knockout of Ccdc113 could also produce acephalic spermatozoa. By using Sun5 and Centlein knockout mouse models, the authors further find SUN5 and CENTLEIN are indispensable for the docking of CCDC113 to the implantation site on the sperm head. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. Furthermore, the study's findings offer valuable insights into the physiological and developmental roles of CCDC113 in the male germ line, which can provide insight into impaired sperm development and male infertility. The conclusions of this paper are mostly well supported by data, but some points need to be clarified and discussed.

      We thank Reviewer #1 for his or her critical reading and the positive assessment.

      (1) In Figure 1, a sperm flagellum protein, which is far away from CCDC113, should be selected as a negative control to exclude artificial effects in co-IP experiments.

      We greatly appreciate Reviewer #1’s insightful suggestion. We will include a negative control in the co-IP experiment to eliminate potential artificial effects.

      (2) Whether the detachment of sperm head and tail in Ccdc113-/- mice is a secondary effect of the sperm flagellum defects? The author should discuss this point.

      Good question. Given that CCDC113 could localized in the sperm neck region, and interact with SUN5 and CENTELIN, CCDC113 may directly function in the sperm head and tail connection. Indeed, PAS staining revealed that Ccdc113–/– sperm heads with abnormal orientation in stages V–VIII seminiferous epithelia (Fig. 6C), and transmission electron microscopy (TEM) analysis further revealed that the disruption of CCDC113 caused the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Fig. 6D). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may be not a secondary effect of the sperm flagellum defects. And we have discuss this point as below:

      CCDC113 could interact with SUN5 and CENTLEIN, but not PMFBP1 (Fig. 7A-C), and CCDC113 was in the cytoplasm in Sun5–/– and Centlein–/– spermatozoa (Fig. 7L, K). In addition, CCDC113 colocalizes with SUN5 in the HTCA region, and the immunofluorescence staining in spermatozoa shows that SUN5 is closer to the sperm nucleus than CCDC113 (Fig. 7G, H). Therefore, SUN5 and CENTLEIN may be more closed to the sperm nucleus compared with CCDC113. PAS staining revealed that Ccdc113–/– sperm heads with abnormal orientation in stages V–VIII seminiferous epithelia (Fig. 6C), and transmission electron microscopy (TEM) analysis further revealed that the disruption of CCDC113 caused the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Fig. 6D). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may be not a secondary effect of the sperm flagellum defects.

      (3) Given that some cytoplasm materials could be observed in Ccdc113-/- spermatozoa (Fig. 5A), whether CCDC113 is also essential for cytoplasmic removal?

      Good question. Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating cytoplasmic removal defects in Ccdc113–/– mice. We have discussed this point as below:

      “Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating cytoplasmic removal defects in Ccdc113–/– mice (Fig. 5A).”

      (4) Although CCDC113 could not bind to PMFBP1, the localization of CCDC113 in Pmfbp1-/- spermatozoa should be also detected to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      We are thankful to Reviewer #1 for this suggestion. We will analyze the localization of CCDC113 in Pmfbp1-/- spermatozoa to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors select the coiled-coil protein CCDC113 and revealed its expression in the stages of spermatogenesis in the testis as well as in the different steps of spermiogenesis with expression also mapped in the different parts of the epididymis. Gene deletion led to male infertility in CRISPR-Cas9 KO mice and PAS staining showed defects mapped in the different stages of the seminiferous cycle and through the different steps of spermiogenesis. EM and IF with several markers of testis germ cells and spermatozoa in the epididymis indicated defects in flagella and head-to-tail coupling for flagella as well as acephaly. The authors' co-IP experiments of expressed CCDC113 in HEK293T cells indicated an association with CFAP91 and DRC2 as well as SUN5 and CENTLEIN.

      The authors propose that CCDC113 connects CFAP91 and DRC2 to doublet microtubules of the axoneme and CCDC113's association with SUN5 and CENTLEIN to stabilize the sperm flagellum head-to-tail coupling apparatus. Extensive experiments mapping CCDC13 during postnatal development are reported as well as negative co-IP experiments and studies with SUN5 KO mice as well as CENTLEIN KO mice.

      Strengths:

      The authors provide compelling observations to indicate the relevance of CCDC113 to flagellum formation with potential protein partners. The data are relevant to sperm flagella formation and its coupling to the sperm head.

      We are grateful to Reviewer #2 for his or her recognition of the strength of this study.

      Weaknesses:

      The authors' observations are consistent with the model proposed but the authors' conclusions for the mechanism may require direct demonstration in sperm flagella. The Walton et al paper shows human CCDC96/113 in cilia of human respiratory epithelia. An application of such methodology to the proteins indicated by Wu et al for the sperm axoneme and head-tail coupling apparatus is eagerly awaited as a follow-up study.

      We thank Reviewer 2 for his/her kindly help in improving the manuscript. We now understand that directly detection of CCDC113 precise localization in sperm axoneme and head-tail coupling apparatus (HTCA) using cryo-electron microscopy (cryo-EM) could powerfully strengthen our model. Recent advances in cryo-electron microscopy (cryo-EM) have facilitated the analysis of axonemal structures and determined the structures of native axonemal DMTs from mouse, bovine, and human sperm (Leung et al., 2023; Zhou et al., 2023). However, some high-resolution structures of sperm axoneme and HTCA regions, including those involving CCDC113, remain to be detected. Thus, we would like to discuss this point and regard it as an important follow-up study.

      References:

      Bazan, R., Schröfel, A., Joachimiak, E., Poprzeczko, M., Pigino, G., & Wloga, D. (2021). Ccdc113/Ccdc96 complex, a novel regulator of ciliary beating that connects radial spoke 3 to dynein g and the nexin link. PLoS Genet, 17(3), e1009388.

      Ghanaeian, A., Majhi, S., McCafferty, C. L., Nami, B., Black, C. S., Yang, S. K., Legal, T., Papoulas, O., Janowska, M., Valente-Paterno, M., Marcotte, E. M., Wloga, D., & Bui, K. H. (2023). Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism. Nat Commun, 14(1), 5741.

      Leung, M. R., Zeng, J., Wang, X., Roelofs, M. C., Huang, W., Zenezini Chiozzi, R., Hevler, J. F., Heck, A. J. R., Dutcher, S. K., Brown, A., Zhang, R., & Zeev-Ben-Mordehai, T.  (2023). Structural specializations of the sperm tail. Cell, 186(13), 2880-2896.e2817

      Walton, T., Gui, M., Velkova, S., Fassad, M. R., Hirst, R. A., Haarman, E., O'Callaghan, C., Bottier, M., Burgoyne, T., Mitchison, H. M., & Brown, A. (2023). Axonemal structures reveal mechanoregulatory and disease mechanisms. Nature, 618(7965), 625-633.

      Zhou, L., Liu, H., Liu, S., Yang, X., Dong, Y., Pan, Y., Xiao, Z., Zheng, B., Sun, Y., Huang, P., Zhang, X., Hu, J., Sun, R., Feng, S., Zhu, Y., Liu, M., Gui, M., & Wu, J. (2023). Structures of sperm flagellar doublet microtubules expand the genetic spectrum of male infertility. Cell, 186(13), 2897-2910.e2819.

    1. Author response:

      We thank the reviewers for their thoughtful and insightful comments. We were pleased to see that the reviewers and editors consider our work a “welcome addition” that “fills a large gap” in comparative genomics methods and provides “an unparalleled community resource of insect genome regulatory annotations.”

      Many of the reviewers’ comments reflect weaknesses in our description of the methodology. As the basic SCRMshaw methodology has been published previously, we had opted for brevity over detail in the current manuscript. We recognize now that we went too far in that direction, and we will include more methodological detail in our revised submission, along with easier access to the code we used. The reviewers also offered some helpful suggestions regarding data availability which we intend to address, including direct download of the results in GFF format and adding to the results database several species that were inadvertently omitted.

      Reviewer 2 expressed concerns about benchmarking SCRMshaw against other methods. We respectfully feel this lies outside the scope of the current study, which focuses on application of SCRMshaw to generate a multi-species annotation resource rather than on an attempt to show that SCRMshaw is superior to other approaches. We provide evidence in this manuscript, as well as in previous publications, that supports the effectiveness of SCRMshaw as an approach for regulatory element discovery that is suitable for the task at hand. Benchmarking for regulatory element discovery brings many challenges, as there are no comprehensive “truth” sets to serve as a comparison baseline. We therefore do not attempt strong claims here about the relative merits of SCRMshaw vs. other methods (although we have explored this in previous publications). Note that we also previously demonstrated commonality of transcription factor binding sites in cross-species SCRMshaw predictions, in particular in Kazemian et al. 2014 (Genome Biol. Evol. 6:2301).

      Finally, because it has important implications for understanding our results, we would like to point out a small misconception in Reviewer 2’s Summary of our study. The reviewer states that we “identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene.” We stress, however, that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences. Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We will take care to clarify this important point in our revised submission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript highlights single-stranded DNA exo- and endo-nuclease activities of ExoIII as a potential caveat and an underestimated source of decreased efficiency in its use in biosensor assays. The data present convincing evidence for the ssDNA nuclease activity of ExoIII and identifies residues that contribute to it. The findings are useful, but the study remains incomplete as the effect on biosensor assays was not established.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors show compelling data indicating that ExoIII has significant ssDNA nuclease activity that is posited to interfere with biosensor assays. This does not come as a surprise as other published works have indeed shown the same, but in this work, the authors provide a deeper analysis of this underestimated activity.

      Response: Thank you so much for reviewing and summarizing our work.

      Strengths:

      The authors used a variety of assays to examine the ssDNA nuclease activity of ExoIII and its origin. Fluorescence-based assays and native gel electrophoresis, combined with MS analysis clearly indicate that both commercial and laboratory purified ExoIII contain ssDNA nuclease activity. Mutational analysis identifies the residues responsible for this activity. Of note is the observation in this submitted work that the sites of ssDNA and dsDNA exonuclease activity overlap, suggesting that it may be difficult to identify mutations that affect one activity but not the other. In this regard, it is of interest the observation by the authors that the ssDNA nuclease activity depends on the sequence composition of the ssDNA, and this may be used as a strategy to suppress this activity when necessary. For example, the authors point out that a 3′ A4-protruding ssDNA could be employed in ExoIII-based assays due to its resistance to digestion. However, this remains an interesting suggestion that the authors do not test, but that would have strengthened their conclusion.

      Response: Thank you so much for the positive evaluation and insightful comments on our manuscript. In the revised version, we have modified the manuscript to address the reviewer’s concerns by providing point-to-point responses to all the comments.

      Weaknesses:

      The authors provide a wealth of experimental data showing that E. coli ExoIII has ssDNA nuclease activities, both exo- and endo-, however this work falls short in showing that indeed this activity practically interferes with ExoIII-driven biosensor assays, as suggested by the authors. Furthermore, it is not clear what new information is gained compared to the one already gathered in previously published works (e.g. references 20 and 21). Also, the authors show that ssDNA nuclease activity has sequence dependence, but in the context of the observation that this activity is driven by the same site as dsDNA Exo, how does this differ from similar sequence effects observed for the dsDNA Exo? (e.g. see Linxweiler, W. and Horz, W. (1982). Nucl. Acids Res. 10, 4845-4859).

      Response: We agree with the reviewer regarding the limitations in showing the practical influence of the ssDNAse activity in the commercial detection kit. Different from the biosensor in reference 20, our results showed a potential impact of ExoⅢ on another frequently used detection system, as the primer and probe required for the detection kit could be digested by ExoⅢ, leading to a lower detection efficiency. Since the activities of ExoⅢ on ssDNA and dsDNA share a same active center, we reason that the difference in sequence specificity of ExoⅢ on these two types of substrates might be caused in two aspects: on the nuclease, some unidentified residues of ExoⅢ that play an auxiliary role in digesting ssDNA but not in dsDNA, might exist, which contribute to the difference we observed; on the substrate structure, without the base-pairing of complementary sequence, the structure of ssDNA is more flexible (changeable with environmental factors such as ions and temperature) than that of dsDNA. The two aspects may collectively result in the difference in sequence specificity of ExoⅢ on ssDNA and dsDNA. We believe that cryo-electronic microscopy-based structure analysis of the ExoⅢ-ssDNA complex would provide more comprehensive and direct evidence.

      Because of the claim that the underestimated ssDNA nuclease activity can interfere with commercially available assays, it would have been appropriate to test this. The authors only show that ssDNA activity can be identified in commercial ExoIII-based kits, but they do not assess how this affects the efficiency of a full reaction of the kit. This could have been achieved by exploiting the observed ssDNA sequence dependence of the nuclease activity. In this regard, the work cited in Ref. 20 showed that indeed ExoIII has ssDNA nuclease activity at concentrations as low as 50-fold less than what test in this work. Ref 20 also tested the effect of the ssDNA nuclease activity in Targeted Recycle Assays, rather than just testing for its presence in a kit.

      Response: Thanks so much for your comments. Logically, to evaluate the practical influence, we need to compare the current and improved detection kits. Our result suggested that raising the temperature or using the mutant may minimize the ssDNase activity of ExoⅢ. But the RAA or RPA-ExoⅢ detection kit is multiple-component system consisting of recombinase T4 UvsX, loading factor T4 UvsY, ssDNA binding protein T4 gp32 polymerase Bsu and ExoⅢ (Analyst. 2018 Dec 17;144(1):31-67. doi: 10.1039/c8an01621f), which collectively decide the performance of the kit. By increasing the temperature, the activities or functions of other proteins contained in the detection kit would also be affected, and the resultant change in detection efficiency would not reflect the real practical influence of the ssDNase activity of ExoⅢ; By replacing the wild type with the mutant, the other four proteins need to be prepared and combined with an optimized ratio for rebuilding the detection system, which is challenging. The targeted recycle assays in Ref 20 is a simple system composed of ExoⅢ and corresponding nucleic acid adapters, which could be easily simulated by the researchers for evaluation. Being a much more complex system, the RAA or RPA-ExoⅢ detection kit is difficult to manipulate for displaying the practical influence. Thank you again for your insightful suggestions; and we may conduct a systematic investigation improve the detection kit in future studies.

      Because of the implication that the presence of ssDNA exonuclease activity may have in reactions that are supposed to only use ExoIII dsDNA exonuclease, it is surprising that in this submitted work no direct comparison of these two activities is done. Please provide an experimental determination of how different the specific activities for ssDNA and dsDNA are.

      Response: As for your suggestion, we have compared the digesting rate of two activities by using an equal amount of the commercial ExoⅢ (10 U/µL) on the two types of substrates (10 µM). The results below revealed that ExoⅢ required 10 minutes to digest the 30-nt single-stranded DNA (ssDNA) (A), whereas it could digest the same sequence on double-stranded DNA (dsDNA) within 1 minute (B) (in a newly produced Supplementary Figure S1). This indicated that ExoⅢ digested the dsDNA at a rate at least ten times faster than ssDNA. In conjunction with these results, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA, even though the digesting rate is not as rapid as the dsDNA. The corresponding text has been added to the result (Lines 200-207).

      Author response image 1.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes some experiments addressing 3' exonuclease and 3' trimming activity of bacterial exonuclease III. The quantitative activity is in fact very low, despite claims to the contrary. The work is of low interest with regard to biology, but possibly of use for methods development. Thus the paper seems better suited to a methods forum.

      Response: We thank you for your time and effort in improving our work. In the following, we have revised the manuscript by providing point-to-point responses to your comments.

      Strengths:

      Technical approaches.

      Response: Thanks for your evaluation.

      Weaknesses:

      The purity of the recombinant proteins is critical, but no information on that is provided. The minimum would be silver-stained SDS-PAGE gels, with some samples overloaded in order to detect contaminants.

      Response: As suggested, we have performed the silver-stained SDS-PAGE on the purified proteins. The result below indicated that no significant contaminant was found, except for a minor contaminant in S217A (in a newly produced Supplementary Figure S4).

      Author response image 2.

      Lines 74-76: What is the evidence that BER in E. coli generates multinucleotide repair patches in vivo? In principle, there is no need for the nick to be widened to a gap, as DNA Pol I acts efficiently from a nick. And what would control the extent of the 3' excision?

      Response: Thank you for the insightful questions. The team of Gwangrog Lee lab has found that ExoⅢ is capable of creating a single-stranded DNA (ssDNA) gap on dsDNA during base excision repair, followed by the repair of DNA polymerase I. The gap size is decided by the rigidity of the generated ssDNA loop and the duplex stability of the dsDNA (Sci Adv. 2021 Jul 14;7(29):eabg0076. doi: 10.1126/sciadv.abg0076).

      Figure 1: The substrates all report only the first phosphodiester cleavage near the 3' end, which is quite a limitation. Do the reported values reflect only the single phosphodiester cleavage? Including the several other nucleotides likely inflates that activity value. And how much is a unit of activity in terms of actual protein concentration? Without that, it's hard to compare the observed activities to the many published studies. As best I know, Exo III was already known to remove a single-nucleotide 3'-overhang, albeit more slowly than the digestion of a duplex, but not zero! We need to be able to calculate an actual specific activity: pmol/min per µg of protein.

      Response: Yes, once the FQ reporter is digested off even one nucleotide or phosphodiester, fluorescence will be generated, and the value reflects how many phosphodiesters at least have been cleaved during the period, based on which the digesting rate or efficiency of the nuclease on ssDNA could be calculated. The following Figure 2 and 3 showed ExoⅢ could digest the ssDNA from the 3’ end, not just a single nucleotide. Since the “unit” has been widely used in numerous studies (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.), its inclusion here aids in facilitating comparisons and evaluations of the activity in these studies. And the actual activity of ExoⅢ had been calculated in Figure 4D.

      Figures 2 & 3: These address the possible issue of 1-nt excision noted above. However, the question of efficiency is still not addressed in the absence of a more quantitative approach, not just "units" from the supplier's label. Moreover, it is quite common that commercial enzyme preparations contain a lot of inactive material.

      Response: Thanks for your comments. In fact, numerous studies have used the commercial ExoⅢ (Nature. 2015 Sep 10;525(7568):274-7; Cell. 2021 Aug 19;184(17):4392-4400.e4; Nat Nanotechnol. 2018 Jan;13(1):34-40.). Using this universal label of “units” helps researchers easily compare or evaluate the activity and its influence. The commercial ExoⅢ is developed by New England Biolabs Co., Ltd., and its quality has been widely examined in a wide range of scientific investigations.

      Figure 4D: This gets to the quantitative point. In this panel, we see that around 0.5 pmol/min of product is produced by 0.025 µmol = 25,000 pmol of the enzyme. That is certainly not very efficient, compared to the digestion of dsDNA or cleavage of an abasic site. It's hard to see that as significant.

      Response: Thanks for your comments; the possible confusion could have arisen due to the arrangement of the figure. Please note that based on Figure 4D, the digestion rate of 0.025 µM ExoⅢ on the substrate is approximately 5 pmol/min (as shown on the right vertical axis), rather than 0.5 pmol/min. Given that the reaction contained ExoⅢ with a concentration of 0.025 uM in a total volume of 10 µL, the quantity of ExoⅢ was determined to be 0.25 pmol (0.025 µmol/L × 10 µL, rather than 25,000 pmol), resulting in a digestion rate of 5 pmol/min. It suggested each molecule of ExoⅢ could digest one nucleotide in 3 seconds (5 pmol nucleotides /0.25 pmol ExoⅢ/60second=0.33 nucleotides/molecular/second). While it may not be as rapid as the digestion of ExoⅢ on dsDNA, a recent study has shown that the ssDNase activity of ExoⅢ surpasses that of the conventional ssDNA-specific nuclease ExoI (Biosensors (Basel), 2023, May 26; 13(6):581, doi: 10.3390/bios13060581), suggesting a potential biological significance of ExoⅢ in bacteria related to ssDNA.

      Line 459 and elsewhere: as noted above, the activity is not "highly efficient". I would say that it is not efficient at all.

      Response: We respectfully disagree with this point. Supported by the outcomes from fluorescence monitoring of FQ reporters, gel analysis of the ssDNA probe, and mass spectrometry findings, the conclusion is convincing, and more importantly, our findings align with a recent study (Biosensors 2023, 13(6), 581; https://doi.org/10.3390/bios13060581).

      Reviewer #3 (Public Review):

      Overall:

      ExoIII has been described and commercialized as a dsDNA-specific nuclease. Several lines of evidence, albeit incomplete, have indicated this may not be entirely true. Therefore, Wang et al comprehensively characterize the endonuclease and exonuclease enzymatic activities of ExoIII on ssDNA. A strength of the manuscript is the testing of popular kits that utilize ExoIII and coming up with and testing practical solutions (e.g. the addition of SSB proteins ExoIII variants such as K121A and varied assay conditions).

      Response: We really appreciate the reviewer for pointing out the significance and strength of our work. Additionally, we have responded point-by-point to the comments and suggestions.

      Comments:

      (1) The footprint of ExoIII on DNA is expected to be quite a bit larger than 5-nt, see structure in manuscript reference #5. Therefore, the substrate design in Figure 1A seems inappropriate for studying the enzymatic activity and it seems likely that ExoIII would be interacting with the FAM and/or BHQ1 ends as well as the DNA. Could this cause quenching? Would this represent real ssDNA activity? Is this figure/data necessary for the manuscript?

      Response: Thanks so much for your questions. The footprint of ExoⅢ on the dsDNA appears to exceed 5 nucleotides based on the structural analysis in reference #5. However, the footprint may vary when targeting ssDNA. Mass spectrometry analysis in our study demonstrated that ExoⅢ degraded a ~20-nucleotide single-stranded DNA substrate to mononucleotides (Figure 3), suggesting its capability to digest a 5-nt single-stranded DNA into mononucleotides as well. Otherwise, the reaction product left would only be 5-nt ssDNA fragment. Thus, the 5-nt FQ reporter is also a substrate for ExoⅢ. ExoⅢ possibly interacts with BHQ1 and affects its quenching efficiency on FAM to trigger the fluorescence release, as shown in Figure 1A, but this possibility has already been ruled out by the development of the RPA-ExoⅢ detection kit. As pointed out in the introduction part, the kit requires a probe labeled with fluorophore and quencher. If ExoⅢ could affect the fluorophore and quencher causing fluorescence release, the detection kit would yield a false-positive result regardless of the presence of the target, rendering the detection system ineffective. Thus, ExoⅢ does not interfere with the fluorophore and quencher. The digestion of ExoⅢ on the ssDNA within the FQ reporter was the sole cause of fluorescence release, and the emitted fluorescence represented the ssDNA activity. The result suggested that the FQ reporter might offer an effective approach to sensitively detect or quantitatively study the ssDNase activity of a protein that has not been characterized.

      (2) Based on the descriptions in the text, it seems there is activity with some of the other nucleases in 1C, 1F, and 1I other than ExoIII and Cas12a. Can this be plotted on a scale that allows the reader to see them relative to one other?

      Response: Thanks so much for your suggestions. We attempted to adjust the figure, but due to most of the values being less than or around 0.005, it was challenging to re-arrange for presentation.

      (3) The sequence alignment in Figure 2N and the corresponding text indicates a region of ExoIII lacking in APE1 that may be responsible for their differences in substrate specificity in regards to ssDNA. Does the mutational analysis support this hypothesis?

      Response: Our result indicated that the mutation of R170 located in the region (αM helix) resulted in lower digesting efficiency on ssDNA than the wild type, which showed that R170 was an important residue for the ssDNase activity, partially supported the hypothesis. Further investigation is needed to determine whether the structure of the αM helix accounts for the distinctions observed between ExoⅢ and APE1. Future research may require more residue mutations in this area for validation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      • A significant fraction of amplitude is missing in the presented fluorescence time courses reporting on ssDNA nuclease activity (Figs 1 B, E, and H). Please indicate the dead time of mixing in these experiments, and if necessary include additional points in this time scale. It is unacceptable for the authors to simply connect the zero-time point and the first experimental point with a dashed line.

      Response: We thank the reviewer for pointing out the critical detail. We agree that simply connecting with a dashed line is an inappropriate way for indicating the real fluorescence generated in the initial stage. The fluorescence monitor machine needs about two minutes to initiate from the moment we place the reaction tube into the machine. But ExoⅢ can induce significant fluorescence immediately, reaching the peak within ~40 seconds, as shown in the video data. Therefore, it is difficult to record the initial real-time fluorescence generated. To avoid misleading, we have added a description in the legend as follows: “The dashed line used in the figure does not indicate the real-time fluorescence generated in the reaction but only represents a trend in the period for the monitor machine to initiate (~2 minutes).” The text was added in Lines 836-838.

      • The authors chose to utilize a 6% agarose electrophoresis to analyze digestion products. However, while this approach clearly shows that the substrates are being digested, it does not allow us to clearly estimate the extent. It would be appropriate to include control denaturing PAGE assays to test the extent of reaction, especially for dsDNA that contains a ssDNA extension, as in Figure 8, or for selected mutants to test whether exo activity may be limited to just a few nts, that may not be resolved with the lower resolution agarose gels.

      Response: We agree with the reviewer that denaturing PAGE assays usually is the choice for high-resolution analysis. And we performed this experiment on the short ssDNA, but observed that the bands of digestion products frequently shifted more or less in the gel. Of note, the other independent study also showed a similar phenomenon (Nucleic Acids Res. 2007;35(9):3118-27. doi: 10.1093/nar/gkm168). Even slight band shifting would significantly interfere with our analysis of the results, especially on the short ssDNA utilized in the study. After numerous attempts, we discovered that 6% agarose gel electrophoresis could detect the digested ssDNA bands with lower resolution than PAGE, but less shifting was observed. Considering all the factors, the 6% agarose gel was finally selected to analyze the digestion process.

      Reviewer #2 (Recommendations For The Authors):

      Line 158: tipycal should be typical

      Response: Thanks so much, and as the reviewer pointed, we have corrected the typo.

      Lines 299-300: "ssD-NA" should not be hyphenated, i.e., it should be ssDNA. .

      Response: Thank you for pointing this out. We have rectified the error and thoroughly reviewed the entire paper for any necessary corrections.

      Reviewer #3 (Recommendations For The Authors):

      Figure 2A should indicate the length of the substate. The legend says omitted nucleotides - I assume they were present in the substrate and just not in the figure? The authors should be very clear about this. Moreover, the text and figure do not well describe the design differences between the three probes. Are they the same except just 23, 21, and 20 nt in length? Are the sequences selected at random?

      Response: Thank you for your questions. The lengths of probes were described in the figure (23, 21, and 20 nt). The legend has been reworded in Line 843 as “The squiggle line represents the ~20 nucleotides of the ssDNA oligo.” And the sequences of three ssDNA substrates were randomly selected, and all the detailed information was provided in Supplementary Table S4.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public reviews):

      Summary:

      Ciliary rootlet is a structure associated with the ciliary basal body (centriole) with beautiful striation observed by electron microscopy. It has been known for more than a century, but its function and protein arrangement are still unknown. This work reconstructed the near-atomic resolution 3D structure of the rootlet using cryo-electron tomography, discovered a number of interesting filamentous structures inside, and built a molecular model of the rootlet.

      Strengths:

      The authors exploited the currently possible ability of cryo-ET and used it appropriately to describe the 3D structure of the rootlet. They carefully conducted subtomogram averaging and classification, which enabled an unprecedented detailed view of this structure. The dual use of (nearly) intact rootlets from cilia and extracted (demembraned) rootlets enabled them to describe with confidence how D1/D2/A bands form periodic structures and cross with longitudinal filaments, which are likely coiled-coil.

      Weaknesses:

      Some more clarifications are needed. This reviewer believes that the authors can address them.

      Reviewer #1 (Recommendations for the authors):

      Recommendation 1: According to Fig.1B, the rootlet was mechanically pulled out from the visual cell for a long distance by vortexing. Is there no artifact? Can the authors comment on it by referring to old literature, for example, with EM of resin-embedded and sectioned basal bodies?

      Response: A previous study (Gilliam et al., 2012) compared cryoET of purified rootlets with resinembedded ultrathin sections of mouse eyecups. They reported no changes in striation repeat or rootlet morphology suggesting there is no artifact of purification. Our rootlet data are consistent with that of Gilliam, suggesting the tomograms we report are representative of rootlets prior to purification. 

      We have clarified this in the text: pg 2: “As previously described (Gilliam et al., 2012), rootlet striation-repeat and morphology appear unaltered by the purification method. Moreover, …” 

      Recommendation 2: Fig.1F: It is not clear how to distinguish striation-membrane joints indicated by grey and white arrows. It seems relatively straight striation is indicated by a white arrow, while in the case of the bulky feature it is shown by a grey arrow (and the bulk is colored in blue). But there is no clear border between these features. How were they distinguished? Are they based on classification?

      Response: The membrane-associated densities (colored in blue) were assigned according to the TomoSeg neural network. It was trained on a small set of globular densities closely associated with a membrane. This training set included examples both close to and far away from the rootlet. We trained a separate network on recognizing rootlet striations. Both networks competed on assigning pixels in the tomogram as either striations or membrane-associated proteins. The different membrane connections were therefore defined by the probability within the TomoSeg network rather than classification.

      We clarified this in the main text: pg 3: “All the striations partially or fully spanned the width of the rootlet and extended beyond the outermost longitudinal filaments. These rootlet-protruding striation-densities frequently contacted the membrane (Fig 1E). Close examination suggested some make a direct contact, whereas others contact a subset of globular membrane-associated densities that are a striking feature of the tomograms. These densities are ~7 nm in diameter and cover almost every membrane surface. Where two membranes come into proximity, the intervening space is filled with two layers of these membrane-associated proteins, one layer associated with each membrane (Fig 1C, S1A, blue arrowheads). We trained a TomoSeg neural network to assign these densities and let this network compete with one that assigned striations. This resulted in a final segmentation with membrane-associated densities indicated in blue and striations in yellow (Fig 1E, F and S1D–F).”  

      We also clarified this in the methods:

      pg 12/13: “The tomograms were then preprocessed in EMAN2.2 for training of the TomoSeg CNN (Chen et al., 2017). Here, the features (filaments, D-bands, A-bands, gold fiducials, actin, membranes, membrane-associated densities and ice contaminations) were individually trained. Segmented maps were allowed to compete for the assignment of pixels in the tomograms, cleaned up in Amira (Thermo Fisher Scientific), and converted to object files. The object files and corresponding tomograms were displayed in ChimeraX (Pettersen et al., 2021). Assignment of direct and indirect striation-membrane connections was done manually by assessing whether TomoSeg-segmented striations and membranes were connected directly or via membrane-associated densities. The automated segmentation of amorphous striations picked up mostly dense amorphous features. The fainter densities that we observed to laterally connect the amorphous features were manually drawn by dotted lines.” 

      Recommendation 3: p.3 "All the striations partially or fully spanned the width of the rootlet before protruding from its surface." This reviewer would read the last part of this sentence as "before protruding from the surface of the rootlet membrane toward inside". Is this correct?

      Response: This was not what we had intended to imply. 

      We have changed this sentence in the text to avoid confusion:  pg 3: “All the striations partially or fully spanned the width of the rootlet and extended beyond the outermost longitudinal filaments. These rootlet-protruding striation-densities frequently contacted the membrane (Fig 1E).”

      Recommendation 4: Same for p.4 "The protrusions from the rootlets were flexible". This means the protrusions from the membrane if this reviewer understands correctly.

      We also clarified this sentence in the text:  pg 4: “The proteinaceous protrusions that extended from the rootlets were flexible and did not induce a regular spacing in the membrane-associated proteins they contacted (Fig 1F, S1D–F).”

      Recommendation 5: p.4 "Due to the thickness of the sample and the presence of membranes": How thick is the typical sample?

      Response: We typically collected data on samples thicker than 300nm. We initially tried making thinner samples, for better contrast, but observed this led to sample disruption. We changed “sample” to “ice” to clarify that we refer to the prepared sample and not the biological object.

      Changes in text:

      pg 4: “Due to the ice-thickness and the presence of membranes, the tomograms had limited contrast.”

      Recommendation 6: p.4 "We were also able to see these bands with cryo-ET." It would be nice if the comparison between tomograms of the native and purified rootlets was done. This reviewer could not get where the D1/D2/A bands are in Fig.1E.

      Response: Due to the noise in the native tomograms it is difficult to see the regular striation pattern in Fig 1E. However, we see it better when we project the native rootlet onto a single image. We added the projection image, the corresponding fourier transform, and repeat measurements to the supplement (Fig S1B, C). We updated all figure references in the text.

      We updated the text accordingly:

      pg 4: “We were also able to see these bands with cryo-ET. The striations in the purified rootlets appeared more ordered and clearer than in the cellular tomograms due to the improved contrast. In the cellular rootlets, we identified the bands in a tomogram projection (Fig S1B), with an average distance of 79.52 ± 0.26 nm between each repeat (Fig S1C). The repeat distance for the purified rootlets is 80.1 ± 0.03 nm based on a sine fit to A and D-bands of 10 fourier-filtered tomogram projections (Fig 2D, Fig S2E–I).”

      We updated the figure legend of Fig S1:

      pg 18: “(B) Projection image of a 53 nm thick slice through the tomogram and the corresponding Fast Fourier Transform (FFT). Measured frequencies are indicated with red lines. (C) Quantification of the distance measured between pairs of discrete striations. (D–F) …”

      Recommendation 7: Fig.2E-I: Could the authors explain how these bands were tracked? It is very difficult for this reviewer to trace, for example, the A-band in Fig.2g.

      Response: We trained the neural network of TomoSeg to pick up discrete and amorphous striations. The Tomoseg segmentation of the amorphous striations often only picked up dense features marked in green. However, we could see densities by eye in the tomograms that connect these dense features.

      These connecting densities were manually drawn with a dotted line.

      We clarified this in the methods:

      pg 13: “The automated segmentation of amorphous striations picked up mostly dense amorphous features. The fainter densities that we observed to laterally connect the amorphous features were manually drawn by dotted lines.”

      We also changed the figure legend of Fig2: 

      pg 5: “(F,G,I) fainter features not picked up by the automated segmentation were drawn with dotted lines.”

      Recommendation 8: Fig.2: The caption of Fig.2I is missing.

      We have edited the legend of Fig 2 to include this caption: pg 5: “(I) Segmentation that shows amorphous features occur as two bands and connect to the rootlet surface densities.”

      Recommendation 9: p.6 "Additionally, the surface densities show evidence of connecting to the A-bands (Fig 2I and S3I)." Does the author mean Fig.2J and S3I?

      Response: This is most clearly visible in figure 2I and S3I (S3J after revisions), but it is also visible in 2J. 

      We therefore edited this figure reference:

      pg 6: (Fig 2I, J and S3J)

      Recommendation 10:  p.8 "The metazoan rootlet is a cilium-associated fiber that is characterized by regular cross-striations." In this reviewer's memory, Tetrahymena also has a rootlet. Are they different in structure?

      Response: Tetrahymena and other protists have striated rootlets (known as kinetodesmal fibres or System-I fibres), that are classified as being different from mammalian rootlets (Andersen et al., 1991). Tetrahymena rootlets have a 32 nm repeat (Munn, 1970), which is less than half of the 80 nm repeat observed for mammalian rootlets. While the protein composition of Tetrahymena rootlets is unknown, a 250 kDa protein was proposed to be their main component (Williams et al., 1979). Tetrahymena rootlet proteins were proposed to span a minimum of 4-5 striation repeats, based on early thin-sectioning EM (Munn, 1970), while we show that rootletin predictions span at most ~3.3 repeats in mammalian rootlets. Since the early proposal of Tetrahymena rootlet protein organisation, more components have been identified: DisAp (Galati et al., 2014) with a predicted length of ~37 nm (0.15 nm/residue), and proteins of 170 kDa that cross react with the Naegleria Gruberi major rootlet component (Dingle & Larson, 1981). Thus, the available data suggest that Tetrahymena rootlets are different in structure from mammalian ones.

      Reviewer #2 (Public reviews):

      Summary:

      This work performs structural analysis on isolated or purified rootlets.

      Strengths:

      To date, most studies of this cellular assembly have been from fluorescence microscopy, conventional TEM methods, or through biochemical analysis of constituents. It is clearly a challenging target for structural analysis due to its complexity and heterogeneity. The authors combine observations from cryo-electron tomograms, automated segmentations, subtomogram averaging, and previous data from the literature to present an overall model of how the rootlet is organised.

      Their model will serve as a jumping-off point for future studies, and as such it is something of considerable value and interest.

      Weaknesses:

      It is speculative but is presented as such, and is well-reasoned, plausible, and thorough.

      Reviewer #2 (Recommendations for the authors):

      Recommendation 1: My suggestions to improve the manuscript lie in some of the technical details:

      The subtomogram averaging methods are overly brief - I am not convinced that someone could replicate the process from the text in the methods (and results sections).

      We have now extended our description of the subtomogram averaging methods: 

      pg 13: “For particle picking, the tomograms were deconvolved using the TOM package (Tegunov & Cramer, 2019). Dynamo was used for particle extraction using the Dynamo surface model (Castaño-Díez et al., 2012, 2017): Each D2 band was traced in multiple slices per rootlet to define dynamo surfaces. Surface triangulation was set to result in extraction coordinates approximately 4 times the number of expected filaments. The coordinates were extracted as a Dynamo table that was subsequently converted to the motl-format using subTOM scripts, available at https://github.com/DustinMorado/subTOM/ (Leneva et al., 2021). Particles were extracted from tomograms reconstructed using novaCTF (Turoňová et al., 2017).

      An initial reference was obtained by in-plane randomizing and averaging all particles prior to alignments. Initial alignments were performed to centre filaments, by using a 10 nm wide cylindrical mask, limited to 4 nm shifts in X and Y with respect to the reference orientation, A spherical mask with large diameter was used for alignments the D-bands, these alignments were restricted to the reference Z direction. Cluster- and careful per-tomogram cross-correlation cleaning were applied to remove particle duplicates, particles with no filaments, and particles with disordered D-bands. This resulted in a cleaned particle dataset.  

      Prior to classification in subTOM, alignments with limited X/Y/Z shifts and increasingly finer in-plane rotations were performed. 20 eigenvolumes were generated by K-means classification over 20 eigenvectors. The eigenvolumes and particles clustered per eigenvector were assessed to identify which vectors described the missing wedge or structural features (Leneva et al., 2021). The structural eigenvectors were used to cluster particles into the final class averages that described particle heterogeneity. 

      For the final subtomogram class-average that contained the twist, the cleaned particle dataset motl was converted to a STAR file compatible with RELION 4.0 alpha (Zivanov et al., 2022). Gold beads were removed from the preprocessed tomogram frames by converting the aligned tomogram gold coordinates initially obtained by Etomo bead-finder during preprocessing steps (Kremer et al., 1996). Particles were then extracted in RELION 4.0 alpha. The initial reference was an inplane randomized average of the cleaned particle dataset. Instead of refinement, which resulted in anisotropic structures due to a lack of features for the alignment, we used simultaneous alignment and classification. We restricted the alignments to full inplane rotations with respect to the reference Z-axis.”

      Recommendation 2: I find it difficult to assess the quality of the final subtomogram averages as presented in the manuscript. One potential worry is the fact that the authors state that nothing is visible outside the mask, which can be a sign of overfitting (though, as the authors state, can just be a sign of heterogeneity). I would suggest that the authors include FSC curves, as well as 2D slices through the unmasked subtomogram averages - it is easier to judge the impact of the mask when viewing it this way and not at the isosurface.

      Response: We understand the reviewer’s concern for overfitting and masking. To clarify our approach, the class averages we show in Fig3G and FigS5C are the result of simultaneous classification with alignment and not a gold-standard refined average. The classification does not produce an FSC since it does not work with half sets. We initially tried a refinement approach, but the filaments did not have enough features to align and resulted in anisotropic structures. The FSC of such a refinement is shown below. However, because of the anisotropy, we did not include these structures or FSCs in the manuscript and we make no claims about the resolution. 

      Author response image 1.

      Instead, we presented the data from simultaneous classification with alignment which revealed the twist in the filament. Like the reviewer, we were initially concerned that the filament twist could be an artefact of the narrow masks and reference we used. However, we only used rotationally symmetric references and masks that do not contain any features. We therefore, realized this asymmetric twistfeature could not have arisen from imposed alignment regiments, reference biases or overfitting. 

      To make our approach clearer, we have updated the main text:

      pg 8: “To ensure unbiased alignment of any coiled-coil features we generated a smooth reference by randomizing the inplane rotational orientation of the particles (Fig S5B). Initial refinement of the data resulted in an anisotropic structure since the filaments did not have enough features to align to. Therefore, we performed classification with alignment in RELION 4.0 alpha (Zivanov et al., 2022), and used a narrow 3.3 nm-wide mask with a smooth edge up to 7.7 nm (Fig S5B). This was the narrowest mask that still resulted in an isotropic structure and revealed features that were absent in the smooth reference. The resulting class averages contained a twist along the filament length in classes 2, 3 and 4 but most prominently in class 5 (Fig S5C). Class 5 contained a filament of 2 nm thick by 5 nm wide with a groove along its length (Fig 3G).” 

      We also clarified this in the methods:

      pg 13: “The initial reference was an inplane randomized average of the cleaned particle dataset. Instead of refinement, which resulted in anisotropic structures due to a lack of features for the alignment, we used simultaneous alignment and classification. We restricted the alignments to full inplane rotations with respect to the reference Z-axis.”

      Recommendation 3: The authors should include the version of Alphafold that they used to perform the structural predictions. Predictions, especially for multimers, have improved in the newest version, and it could be expected that further improvements will occur in the future. Including the version used here will act as a timestamp.

      We have now updated the methods to include the version:

      pg 14: “Alpha fold predictions of 300 AA long dimer fragments with 50 AA overlap were generated using colabfold 4 that uses a modified version of alphaFold2. To run the large number of sequences we used a customized script called alphascreen (version 1.15) available at https://github.com/samichaaban/alphascreen.”

      Recommendation 4: Figure 2G is not so clear in depicting two offset D bands. The authors could include a more zoomed-out image to make it clearer.

      Response: We have now included a more zoomed out image in the supplement (Fig S3A).

      We updated the figure legend of Fig 2G and Fig S3A: pg 5: “(G) Example where D1 aligns with D2 of a neighboring sub-fiber. Larger view in Fig S3A.”

      pg 20: “(A) Tomogram slice and segmentation where D1 aligns with D2 of a neighboring sub-fiber. The dotted square marks the location of Fig 2G. (B)”

      Recommendation 5: Did the authors attempt to predict the structure of rootletin oligomers? i.e. folding four rootletin fragments at once instead of two? This could be interesting.

      Response: We attempted to predict interactions between all combinations of rootletin fragments. We did this for two fragment (e.g. CC1+CC1 or CC1+CC2) and four fragment (e.g. CC1+CC1+CC1+CC1 or CC1+CC1+CC2+CC2) combinations.

      Homodimer combinations (e.g. CC1+CC1) were predicted with most confidence. We did not identify any higher oligomerization. AlphaFold did not identify interactions that were previously proposed in the literature–for example between two CC3 dimers (Ko et al., 2020) or weak interactions between CC2 and CC3 (Yang et al., 2002). These interactions were either not properly predicted or may require additional proteins other than the ones we tested (CCDC102B, CEP68, beta-catenin, ARL2, centlein). 

      We have updated our methods to include our AlphaFold attempts:

      Pg 14: “This setup was used to predict interactions for dimeric and oligomeric combinations of rootletin fragments (e.g. CC2+CC2, CC3+CC4, CC1+CC1+CC1+CC1, CC3+CC3+CC4+CC4 etc). Homodimeric and oligomeric combinations were tested with other proteins identified as putative rootletin-binding: CCDC102B, CEP68, beta-catenin, ARL2, centlein. In our hands, only homodimeric rootletin fragment combinations resulted in confident predictions.”

      Reviewer #3 (Public reviews):

      Summary:

      The study offers a compelling molecular model for the organization of rootlets, a critical organelle that links cilia to the basal body. Striations have been observed in rootlets, but their assembly, composition, and function remain unknown. While previous research has explored rootlet structure and organization, this study delivers an unprecedented level of resolution, valuable to the centrosome and cilia field. The authors isolated rootlets from mice's eyes. They apply EM to partially purified rootlets (first negative stain, then cryoET). From these micrographs, they observed striations along the membranes along the rootlet but no regular spacing was observed.

      The thickness of the sample and membranes prevented good contrast in the tomograms. Thus they further purified the rootlets using detergent, which allowed them to obtain cryoET micrographs of the rootlets with greater details. The tomograms were segmented and further processed to improve the features of the rootlet structures. From their analysis, they described 3 regular cross-striations and amorphous densities, which are connected perpendicularly to filaments along the length of the rootlets. They propose that various proteins provide the striations and rootletin (mouse homolog of human cnap1) forms parallel coiled coils that run along the rootlet. Overall their data provide a detailed model for the molecular organization of the rootlet.

      The major strength is that this high-quality study uses state-of-the-art cryo-electron tomography, subtomogram averaging, and image analysis to provide a model of the molecular organization of rootlets. The micrographs are exceptional, with excellent contrast and details, which also implies the sample preparation was well optimized to provide excellent samples for cryo-ET. The manuscript is also clear and accessible.

      To further validate their model, it would have been useful to identify some components in the EM maps through complementary approaches (mass spectrometry, mutants disrupting certain features, CLEM). Some potential candidates are mentioned in the discussion.

      This research marks a significant step forward in our understanding of rootlets' molecular organization.

      Response: We agree with the reviewer that it would be ideal to identify rootlet components in the EM densities using complementary approaches. Prior to submitting the manuscript, we attempted several approaches, the details of which are described below:

      We performed mass spectrometry on our purified rootlets. This identified the rootlet components rootletin and CCDC102B and various axonemal components, due to the association between the rootlet and axoneme. However, due to the limitations in quantifying components using mass spectrometry, we were unable to confidently identify novel rootlet constituents present in quantities comparable to rootletin.

      We further attempted cross-linking mass spectrometry on the rootlets to gain deeper insights to the interactions between rootletin molecules. Unfortunately, this effort resulted in a completely insoluble sample despite extended digestion times, leading to issues with mass spectrometry column clogging and rendering our results inconclusive.

      We attempted to express rootlet components recombinantly and were able to purify fibres, but they did not contain the characteristic repeat pattern seen in native rootlets. We also considered purifying native rootlets from cultured cells, but we were unable to obtain sufficient sample for cryoET imaging.

      We therefore regret that other approaches to validate our model are outside the scope of this current work.

      Reviewer #3 (Recommendations for the authors):

      Recommendation 1: There are some problems with spaces in references in the methods.

      Response: We have thoroughly checked the methods and manuscript for double spaces and corrected this.

      Recommendation 2: Figure 1A, the figure would benefit from more labelling, to show the reader the basal body and nucleus.

      Response: We have now added the labels "basal bodies" and "Nucleus" to the cartoon in Fig 1A.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      While the role of Rab27 was strongly examined, the hits of the VAMP proteins were not explored in detail. I was wondering if the decrease in the presence of VAMPS directly suggests the final step of membrane fusion in the exocytosis of EVs is what is being impaired. Or if it is other trafficking steps along the EV secretion pathway.

      We appreciate the relevance of this comment and we agree that the decrease of VAMP gene expression in the β-catenin-mutated HepG2 cells could suggest an impairment of the final membrane fusion step in exocytosis of EVs. We have therefore expanded this important point in the discussion (page 10). Indeed, we identified an upregulation of VAMP2, VAMP5 and VAMP8 expressions after mutated β-catenin depletion in the transcriptomic analysis of HepG2 cells. However, these proteins were not detected in the mass spectrometry analysis. Only VAMP3 and VAMP7 proteins were detected in the proteomic analysis without any variation. This is why we didn't focus on this trafficking step, but it could be interesting to explore it further in the future. 

      Reviewer 2:

      (1) In Figure 1F, it is essential to investigate why mass spectrometry analysis indicated no significant changes in SDC4 levels.

      We agree with the reviewer that indeed whereas we did observe a significant alteration of syndecan-4 expression at the mRNA level, we did not observe significant changes in syndecan-4 levels by mass spectrometry. One possible explanation is that heparan sulfate proteoglycans like syndecan-4 exhibit a high degree of structural heterogeneity due to the biosynthetic process that produces linear polysaccharides. This characteristic can alter the robustness of mass spectrometry analyses, leading to greater variability. 

      (2) Figure 2G lacks clarity in explaining how the quantification of MVBs (multivesicular bodies) was conducted.

      We apologize for the lack in clarity in explaining how the quantification of MVBs was conducted in figure 2G. The Materials and methods section (part electron microscopy-cells, page 23) has been modified in order to emphasize this point.

      (3) In Supplementary Figure 1F, there is a suggestion to highlight exosomes using arrowheads for enhanced clarity.

      According to the reviewer’s suggestions, we added arrowheads on supplementary figure 1F in order to highlight the exosomes (page 16). This indeed improves clarity.

      (4) Figure 3C prompts a question about the peculiar appearance of Actin staining in KD cells, requiring further investigation.

      The peculiar appearance of this intense phalloidin staining between hepatocytes corresponds to bile canaliculi (BC), features of more differentiated HepG2 cells. As phalloidin-stained BC are very bright, this may diminish the visibility of other, thinner actin structures. We decided to change the image of KD cells for a more relevant one (new Figure 3C).

      (5) An intriguing avenue for exploration is suggested in testing how the treatment of a GSK inhibitor on HepG2 cells might impact Rab27a and SDC4 expression.

      We appreciate the relevance of the suggestion in testing how the treatment of a GSK inhibitor on HepG2 cells might impact Rab27a and SDC4 expression. According to the reviewer’s suggestions, experiments have been carried out and the data are presented in Author response image 1 below. In HepG2 cells, GSK inhibitor stabilized the wild-type β-catenin protein but surprisingly the mutated form of β-catenin is slightly decreased (Author response image 1A). Regarding the expression levels of both Rab27a and SDC4 mRNA, a small increase is observed (Author response image 1B). Rab27a protein is also increased upon the treatment with a GSK inhibitor on HepG2 cells (Author response image 1C). This increased in expression could be due to the decrease of the mutated form of β-catenin in HepG2 cells confirming that Rab27a and SDC4 are repressed by the mutated β-catenin. 

      Author response image 1.

      Impact of a GSK inhibitor (CHIR99021) on Rab27a and syndecan-4 (SDC4) expressions in HepG2 cells. HepG2 cells were treated by 3 µM CHIR990221 or DMSO as control for 48h. A) Western-blot (upper panel) and quantification (lower panel) of wild-type (WT) and mutated (MUT) β-catenin proteins in HepG2 cells treated with DMSO (control) or with CHIR990221. B) qRT-PCR analysis of Rab27a and SDC4 expression in HepG2 cells treated with DMSO (control) or with CHIR990221. C) Western-blot (left panel) and quantification (right panel) of Rab27a protein in HepG2 cells treated with DMSO (control) or with CHIR990221. *P<0.05

      Reviewer 3:

      (1) One limitation of this study is that the mechanistic relationship of exosome release and how they affect immune cells remains to be elucidated. In this context, the authors conclusions rest on the assumption that hepatocarcinoma immune evasion is based exclusively on the reduced number of exosomes. However, the authors do not analyze exosome composition between exosomes of wild type and oncogenic background, which could be different.

      We agree that the mechanistic relationship of exosome release and how they affect immune cells remains to be elucidated. In the discussion we mentioned that the content of ß-catenin-regulated EVs remains to be explored to fully understand their function in the immunomodulation of the tumor microenvironment. In this line, we have ongoing experiments in order to analyse the exosomal content in term of proteins and microRNAs. According to our preliminary results, we are able to say  that the exosome composition in knock-down mutated ß-catenin HepG2 cells compared to control HepG2 cells seems to be different suggesting not only an involvement of the number of exosomes in the immunomodulation but also of their content. 

      (2) The manuscript would benefit from minor language editing and the introduction from restructuring to enhance clarity.

      The manuscript has now benefited from a language editing thanks to the Professor William A. Thomas (Colby-Sawyer College, New Hampshire). Acknowledgments have been modified (page 12) to thank the Professor William A. Thomas for proof- reading of the manuscript. The introduction has been also restructured and modified according to the reviewer's suggestions to enhance clarity (page 3).

      (3) I believe that within the abstract, the authors mean 'defect' not 'default' in the sentence: Then, we demonstrated in 3D spheroid models that activation of β-catenin promotes a decrease of immune cell infiltration through a default in exosome secretion.

      We apologize for the mistake between 'default' and 'defect' in the abstract. The abstract has been modified accordingly.

      (4) Within the 'Introduction' part of the manuscript, the authors might consider reviewing and reorganizing the first paragraph for more clarity - I suggest leading with the first three sentences of the second paragraph (HCC is the most...) and then introducing b-catenin and the effects and implications of oncogenic ß-catenin in HCC.

      If the authors prefer the current structure of the 'Introduction', I would like to propose exchanging some of the wording:

      -In line 4: 'despite' instead of 'in front of'? Sentence: Thus, in front of the therapeutic revolution for cancers, with the emergence of immunotherapy and more particularly immune checkpoint inhibitors (anti-PD1, anti-PD-L1)

      -Additionally in line 7: In these tumors, the oncogenic β-catenin is able to set up a microenvironment that favors tumor progression notably by promoting immune escape. Here, 'establish' might be a better choice instead of 'set up' - In line 9 I suggest rephrasing the sentence: Few studies have reported that the defect of intercellular communication between cancer cells and immune cells is partly mediated by a decrease of chemokines production leading to a reduction of immune infiltrates.... and maybe adding a reference here.

      The introduction has been altered accordingly. Thanks for these suggestions that helped us to improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study elucidates a detailed molecular mechanism of the initial stages of transport in a medically relevant GABA neurotransmitter transporter GAT1 and thus generates useful new insights for this protein family. In particular, it presents convincing evidence for the presence of a "staging binding site" that locally concentrates Na+ ions to increase transport activity, whilst solid evidence for how Na+ binding affects the larger scale dynamics.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the S1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank our reviewer for the overall positive evaluation.

      Weaknesses:

      (1) How conserved are the residue pair of D281-E283 in other SLC6 transporters. The authors commented on the presence of these residues in SERT but it would be nice to know how widespread these residues are in other SLC6 transporters like NET, GlyT, and DAT.

      We have created a sequence alignment of the entire human SLC6 family (Supplementary Figure 1) and found that E283 is polar or charged in all SLC6 transporters. D281 shows a higher level of conservation across the family compared to E283. D281 is negatively charged in approximately 50% of the SLC6 family members, an aspartate in all GABA transporters and a glutamate in all monoamine transporters.

      (2) Further, one would like to see the effect of individual mutations D281A and E283A on transport, surface expression, and EC50 of Na+ to gauge the effect on transport.

      We have carried out experiments to investigate the effects of the individual mutations. The results revealed intermediate effects between WT and the double mutant (D281A-E283A) and showed that the effects mostly align with the degree of conservation, as a neutralisation of D281 by alanine has a stronger effect than the E283A mutant. Both single mutants had minimal effects on the sodium dependence of uptake, D281A had a stronger effect on expression, Km and Vmax as compared to E283. Only D281A reduced surface expression, while E283A expresses to a similar level as wild type GAT1.

      (3) A clear figure of the S1 site where Na+ tends to stay prior to Na1 site interactions needs to be provided with a clear figure. Further, it is not entirely clear how access to S1 is altered if the transporter is in an outwardoccluded conformation if F294 is blocking solvent access. Please comment.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility. We have also added a comment on the role of F294 as part of the outer hydrophobic gate to the discussion. In short, F294 does not occlude the passage to the S1 as long as GAT1 is outward open, and we find that GAT1 is outward open in all sodium binding simulations.

      (4) The p-value of the EC50 differences between GAT1WT and GAT1double mutant need to be mentioned. The difference in sodium dependence EC50 seems less than twofold, and it would be useful to mention how critical the role of the recruitment site is. Since the transport is not affected the site could play a transient role in attracting ions.

      We have added p-values or standard deviation to our data.

      (5) It would be very nice to know how K+ ions are attracted by this recruitment site. This could further act as a control simulation to test the preference for Na+ ions among SLC6 members.

      We think that attraction of potassium to the recruitment site is not of relevance, as the residues are at the extracellular side and exposed to bulk, where the concentration of sodium is high (typically 130-150 mM), while the concentration of potassium is very small (3-5 mM). Exploring sodium binding by simulations for all SLC6 members could be interesting, but clearly outside the scope of this manuscript.

      (6) Some of the important figures are not very clear. For instance, there should be a zoomed-in view of the recruitment site. The current one in Fig. 1b and 1c could be made clearer. Similarly as mentioned earlier the Na residence at the S1 site away from the Na1 and Na2 sites needs to be shown with greater clarity by putting side chain information in Fig. 6d.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility.

      (7) The structural features that comprise the two principal components PC1 and PC2 should be described in greater detail.

      We have modified Figure 6 and added images that show the motions along PC1 and PC2. In addition, these are now better explained in the text.

      Reviewer #2 (Public Review):

      Summary:

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be cotransported with the substrate, GABA (and a co-transported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281A-E283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths:

      -  MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      -  A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      -  The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the review for the very supportive assessment.

      Weaknesses:

      -  Assessing the effects of Na+ binding on the large-scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to an AF2 model that's not well packed or really a feature of the open outward conformation.

      The long range effect of sodium binding to GAT1 and destabilisation of the inner gate has, based on our data, a causal effect. PCA separates conformational motions into degrees of freedom and sorts them according to the largest motions. Motions of TM5a were among the 2 largest motions, which suggests that these are relevant motions. To directly quantify their behaviour, we measured informative distances at the inner gate of GAT1, as shown in Figure 6i,j,k and separated data according to the presence of sodium in NA2.

      For the following reasons we exclude that the results are a consequence of structural inconsistencies introduced by AlphaFold2 and therefore not reflecting functionally relevant effects:

      (1) If depending on the model instead of sodium binding, the effects should not be correlated with the presence of sodium in the NA2 binding site.

      (2)  We carried out new simulations starting from the occluded GAT1 structure (Figure 6j,k). The data shows that in the occluded state the distance across the inner vestibule and the length of TM5a differ, consistent with our interpretation of the data. As sodium binding fixes GAT1 outwardfacing, as it also occurs in other SLC6 family members (Szöllősi and Stockner, 2022), the distances of the outward-open GAT1 are at the short extreme of the scale, distances of the inward-open state of the cryo-EM structure(s) are at the other extreme, while the occluded conformation of GAT1 shows intermediate values.

      (3)  We have observed the same property in SERT, for which we used experimental structures as starting structure (Gradisch et al., 2024), suggesting that this could be a generally mechanism.

      (4)  All available structures from the entire SLC6 family are consistent with structural effects of TM5a in response to bundle domain motions and therefore to binding of sodium to NA2 as it stabilized the outward-open state as well as transition to the inward facing conformation.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      Simulations can always be too short and therefore not fully describe the complete underlying conformational ensemble. We added a statement in the discussion indicating this shortcoming. With respect to the tICA analysis in our manuscript, the tICA approach does, by design, not need long simulations that capture the full binding and unbinding in multiple instances to construct a correct free energy landscape. Instead, the tICA method builds on Markov chain dependencies and relies only on the convergence of transitions between hundreds of conformational microstates and the fluxes between them. The free energy profile derived for the S1, including NA1, TMP and NA2 and up to the salt bridge of the outer gate is well converged and we observed many transitions. In contrast, the entry from the recruitment side to the S1 has most likely a too low density of microstate and a too small number of transition to be considered converged with respect to quantifying the free energy of binding from bulk. We now explain this shortcoming.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Authors should furnish p-values in the figure legends for experimental results.

      We have added the p-values to text and figure legends.

      Reviewer #2 (Recommendations For The Authors):

      -  Deposit simulation data in a public repository (input files, trajectories (possibly subsampled)).

      We deposited the data to Zenodo and provided the DOI: 10.5281/zenodo.10686813 to the data. As we were unable to upload the trajectories to zenodo, we deposited the starting and the end structures of the simulations.

      -  Please include a short discussion of the reliability of using an AF2 model instead of experimental structures. What is expected to be correct/which parts of the structure are potentially incorrect? What makes you think that the AF2 model is a good model of the OF conformation of GAT1?

      Unfortunately, an outward-facing structure of GAT1 is not available. We have initially worked with an outward-open homology model of GAT1 based on SERT (build with MODELLER), but the structural differences between SERT and GAT1 are sufficiently large that these models did not behave well in simulations and too frequently could not maintain a sealed inner gate, also forming a channel. In contrast to the SERT-based GAT1 model, the AlphaFold2 model of GAT1 behaved as expected and consistent with the behaviour of SERT in simulations and with general knowledge of protein dynamics from literature. Based on structural analysis of our simulations and on the comparison to SERT we could not identify a region of GAT1 which would be potentially behave incorrect or unexpectedly. We added a statement to the discussion on this potential limitation of the use of homology models.

      -  Fig 1a: Na+ densities are not very clear (both due to small size and the transparency). I have a hard time seeing where bulk, 2*bulk regions are --- are you showing "onion shells" of density? Perhaps investigate presenting as cuts through the full density?

      I like the labelling in terms of absolute density and multiples of bulk.

      We have created new images to improve the visualisation of data. The data are shown as onion shells (isosurface), with the shells at the indicated densities. This is now clearly stated. Transparency is needed, otherwise e.g. the inner onion shells would not be visible. The cut-through is intuitive, but we could not find a useful plain, as the densities are too extensively distributed in 3D and not on a single plain.

      -  Fig 1h-k: would be clearer if "recruitment site" (TMP?) was indicated in the figure.

      We have created a new image for the recruiting site (Figure 1b,c) and temporary site (Figure 1g) and indicated these two sites as appropriate.

      -  Show time series of Na+ binding with a suitable order parameter (z or distances to NA1 and NA2?) to show how ions bind spontaneously. Mark the different sites. Mark pre- and post-binding parts of trajectories.

      We have added time series for every simulation that shows sodium binding to the NA1 or NA2 to the supplementary information Figure 2a,b,c. These quantify the distances to the recruiting site, the temporary site and the respective sodium binding site.

      -  PCA - how much of the total variance was captured by PC1 and PC2?

      The variance captured by the PCs are shown as eigenvalues in supplementary information Figure 4. PC1 captures about 19% of the variance, PC2 8%.

      -  "We found that the inner hydrophobic gate is dynamic in the absence of Na2" -- is this instability due to the AF2 model or likely realistic? E.g. was similar behaviour ever observed in simulations of the occluded state?

      In simulations of the occluded state we do not see such instabilities as observed in the outward-open state in the absence of sodium (Figure 6). As these larger scale fluctuations are not randomly distributed across all simulations starting from the AlphaFold2 models, but confined to the systems without sodium, it is unlikely an effect of the AlphaFold2 model.

      Please note, we have seen comparable behaviour in simulations of SERT starting from experimental structures (Gradisch et al., 2024), therefore suggesting a more general mechanism.

      -  Cooperativity between GABA-binding and Na+ binding to NA1: How would this lead to an experimentally measurable signature, i.e., which experiments could validate this interesting prediction?

      Direct detection of cooperativity is difficult to separate from other effects in experiments, as sodium binding and transport involves NA1 and NA2, NA2 has a higher affinity according to our data, while mutations will not only affect cooperativity, but will also have other effects.

      Conformational changes can also complicate experimental detection, as NA2 stabilises the outward-open conformation, while NA1+GABA binding triggers the transition to the inward-open state. To quantify cooperativity, it would be important to isolate the cooperative from all other effects, which is a challenge. Support for cooperativity has been found by (Zhou, Zomot and Kanner, 2006; Meinild and Forster, 2012) using this route. In the first paper the authors make use of lithium that only binds to the NA2, even though lithium is not only a mere NA2 selective ligand and otherwise identical to sodium. By comparing two GABA concentrates the authors showed that the sodium dependence of GABA transport is left shifted at higher GABA concentrations, which is not the case in the absence of lithium. This data is indirect, but consistent with cooperativity between GABA and NA1-bound sodium, as GABA transport mainly reflects binding of sodium to NA1. Similar approaches could be further explored, for example by varying the GABA concentration instead of sodium. Other options could be to create an outward-facing and conformationally locked GAT1 and to measure the cooperativity of sodium and GABA binding using for example the scintillation proximity assay. Most likely the assay would also need a way to be NA2 binding independent. We are not aware of such a GABA transporter system.

      -  There are some instances of [SI Figure] or [citation needed] that should be cleaned up.

      We have corrected these instances.

      References

      Gradisch, R. et al. (2024) ‘Ligand coupling mechanism of the human serotonin transporter differentiates substrates from inhibitors’, Nature Communications, 15(1), p. 417. Available at: https://doi.org/10.1038/s41467-023-44637-6.

      Meinild, A.-K. and Forster, I.C. (2012) ‘Using lithium to probe sequential cation interactions with GAT1’, American Journal of Physiology. Cell Physiology, 302(11), pp. C1661-1675. Available at: https://doi.org/10.1152/ajpcell.00446.2011.

      Szöllősi, D. and Stockner, T. (2022) ‘Sodium Binding Stabilizes the Outward-Open State of SERT by Limiting Bundle Domain Motions’, Cells, 11(2), p. 255. Available at: https://doi.org/10.3390/cells11020255.

      Zhou, Y., Zomot, E. and Kanner, B.I. (2006) ‘Identification of a lithium interaction site in the gamma-aminobutyric acid (GABA) transporter GAT-1’, The Journal of Biological Chemistry, 281(31), pp. 22092–22099. Available at: https://doi.org/10.1074/jbc.M602319200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a potentially important study that integrates QM/MM free energy simulations and experimental kinetic analyses to probe the nature of phosphoryl transfer transition state in adenylate kinase. The idea that the transition state ensemble encompasses conformations with substantially different structural features (including the breaking/forming bonds) is interesting and potentially applicable to many other enzyme systems. In the current form, however, the study is considered incomplete since the connection between the putative transition state ensemble from the computations and key experimental observables, such as the activation entropy, is not well established.

      Thank you so much for your great professional work as the senior editor. We thank you and the reviewers for carefully reading our manuscript and for very valuable suggestions. In response, we have performed the recommended additional calculations and modified the manuscript as suggested, in order to improve the connection between the transition state ensemble obtained from simulations and experimental observables. Importantly, the new simulations fully corroborate our original findings, and thanks to your work made the revised manuscript stronger and better.

      Below are our point-to-point responses:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the phosphoryl transfer mechanism of the enzyme adenylate kinase, using SCC-DFTB quantum mechanical/molecular mechanical (QM/MM) simulations, along with kinetic studies exploring the temperature and pH dependence of the enzyme's activity, as well as the effects of various active site mutants. Based on a broad free energy landscape near the transition state, the authors proposed the existence of wide transition states (TS), characterized by the transferring phosphoryl group adopting a meta-phosphate-like geometry with asymmetric bond distances to the nucleophilic and leaving oxygens. In support of this finding, kinetic experiments were conducted with Ca2+ ions (instead of Mg2+) at different temperatures, which revealed a negative entropy of activation. Overall, in its present form, the manuscript has more weaknesses in terms of interpretation of the simulation results than strengths, which need to be addressed by the authors.

      We thank the reviewer for carefully reviewing our manuscript and the great suggestions for the revisions. Thanks to these points raised we are able to submit a revised manuscript addressing all questions.

      There are several major concerns:

      First, the authors' claim that the catalytic mechanism of adenylate kinase (Adk) has not been previously studied by QM/MM free energy simulations is somewhat inaccurate. In fact, two different groups have previously investigated the catalytic mechanism of Adk. The first study, cited by the authors themselves, used the string method to determine the minimum free energy profile, but resulted in an unexpected intermediate; note that they obtained a minimum free energy profile, not a minimum energy profile. The second study (Ojedat-May et al., Biochemistry 2021 and Dulko-Smith et al., J Chem Inf Model 2023) overlaps substantially with the present study, but its main conclusions differ from those of the present study. Therefore, a thorough discussion comparing the results of these studies is needed.

      We thank the reviewer for pointing out two additional articles to the one we had discussed. Accordingly, we have changed the claim that the Adk mechanism was not previously studied using QM/MM, and added a discussion of the latter two citations. Notably, although the general outcome is consistent with our results, the conclusions and details of findings differ. The two additional papers agree with our findings of a concerted TS, and not the metastable intermediate as observed in the QM/MM simulation of Shibanuma et al., 2020.

      The difference of the two papers by Nam/Wolf-Watz and our manuscript pointed out by the reviewer is mainly in the interpretation. Importantly, the authors do not primarily focus on the nature of the Transition State for the P-transfer reaction, but on the connection between the chemical and conformational steps. We have extensively reported on the fact that the conformational changes of lid opening and closing are obviously unrelated to the chemical step, see also our free energy landscape in Fig. 1a. Consequently, there cannot be a coupling. We note that our group had extensively studied the lid opening step both experimentally and computationally before. In contrast, we discover here a fundamental concept for rate enhancement by an optimal enzyme: the reduction in the activation entropy by a wide TSE. New experiments were triggered by this finding, that then delivered experimental validation of this concept.

      In the revised version of the manuscript, and according to the reviewer’s suggestion we expanded our discussion to these two additional papers.

      Second, the interpretation of the TS ensemble needs deeper scrutiny. In general, the TS is defined as the hypersurface separating the reactant and product states. Consequently, if a correct reaction coordinate is defined, trajectories initiated at the TS should have equal probabilities of reaching either the reactant or product state; if an approximate reaction coordinate, such as the distance difference used in this study, is used, recrossing may be introduced as a correction into the probabilities. Thus, in order to establish the presence of a wide TS region, it is necessary to characterize the TS ensemble through a commitment analysis across the TS region.

      We thank the reviewer for suggesting to add a commitment analysis to our calculations. The newly performed commitment analysis is shown in Fig. 4b. The corresponding analysis further strengthens our original findings of the wide TS in the fully active enzyme.

      The relatively flat free energy surface observed near TS in Figures 1c and 2a, may be attributed to the cleavage and formation of P-O bonds relative to the marginally stable phosphorane intermediate, as described in Zhou et al.'s work (Chem Rev 1998, 98:991). This scenario is clearly different from a wide TS ensemble concept. In addition, given the inherent similarity in reactivity of the two oxygens towards the phosphoryl atom, it is reasonable to expect a single TS as shown in Figure 1 - supplement 9, rather than two TSs with a marginally stable intermediate as shown in Figure 1c. Consequently, it remains uncertain whether the elongated P-O bonds observed near the TS and their asymmetry are realistic or potentially an artifact of the pulling/non-equilibrium MD simulations. Further validation in this regard is required.

      The reviewer raises the key issue of how realistic the observation of the wide TSE is, and the possibility of it being a potential artifact of the simulation strategy, and suggests that further validation is required in this regard. According to his/her suggestion, in the revised version we have further validated this key observation by two additional simulations. First, we performed a commitment analysis (see above), and second, we also performed Umbrella Sampling, see Fig. 4a. We consistently observe one wide TSE in the presence of Mg2+, but not in the absence of Mg2+. The fact that this wide TSE is observed with the three strategies (i.e pulling/nonequilibrium MD, commitment analysis, and umbrella sampling) most likely rules out the possibility of an artifact related to the simulation strategy.

      Third, there are several inconsistencies in the free energy results and their discussion. First, the data from Kerns et al. (Kerns, NSMB, 2015, 22:124) indicate that the ATP/AMP -> ADP/ADP reaction proceeds at a faster rate than the ADP/ADP -> ATP/AMP reaction, suggesting that the ADP/ADP state has a lower free energy (approximately -1.0 kcal/mol) compared to the ATP/ATP state. This contrasts with Figure 1c, which shows a higher free energy of 6.0 kcal/mol for the ATP/ADP state. This discrepancy needs to be discussed.

      The reviewer correctly found our experimental result on the equilibrium of about -1 kcal/mol for ADP/ADP relative to ATP/AMP with Mg. Importantly, that was measured at a pH of 7. With a pKA of about 7.2 for ADP, under these experimental conditions more than 50% is in the monoprotonated state. As we found in our QM/MM simulations, for the monoprotonated state the ADP/ADP is much more stable than ATP/AMP (see Figure 1 – supplement 4, about 8 kcal/mol). In contrast, as shown in Fig. 1c and highlighted by the reviewer, for the nonprotonated state the equilibrium is flipped. Consequently our QM/MM simulations roughly recapitulate the ensemble equilibrium of substrates/products measured at pH 7. 

      We should have better described these facts in the manuscript, and we thank the reviewer for noting this point, as it promoted us to better explaining this agreement between experiments and computation for this on enzyme equilibrium between the substrate and product states (see page 11 in the revised manuscript).

      Furthermore, the barrier for ATP/AMP -> ADP/ADP, calculated to be 20 kcal/mol for the fully charged state, exceeds the corresponding barrier for the monoprotonated state. This cautions against the conclusion that the fully charged state is the reactive state. In addition, the difference in the barrier for the no-Mg2+ system compared to the barriers with Mg2+ is substantially too large (21 kcal/mol from the calculation versus 7 kcal/mol from the experimental values). These inconsistencies raise questions as to their origins, whether they result from the use of the pulling/non-equilibrium MD simulation approach, which may yield unrealistic TS geometries, or from potential issues related to the convergence of the determined free energy values. To address this issue, a comparison of results obtained by umbrella sampling and similar methodologies is necessary.

      We agree that these points need to be clarified. For the resubmission, we performed an umbrella sampling for the fully charged nucleotide with Mg2+, and for the noMg2+ systems, and added these new figures to the manuscript (new Fig. 4). We agree with the reviewer that the obtained free energy profiles from the umbrella sampling are more reliable; the original simulations for the monoprotonated state have larger errors, see Fig. 1, supplement 4. Importantly, we experimentally measured the pH dependence of the reaction in the direction ADP/ADP to AMP and ATP, and hence compare the corresponding barriers in this direction.

      In respect to the comparison of the simulated (9.5 kcal/mol) to the experimental barriers with and without Mg, the experimental barrier is 7 kcal/mol for Ca2+ versus no metal, but larger for Mg2+ versus no metal, for which the simulations were performed. The P-transfer with Mg2+ is faster than 500 sec-1, meaning the experimental barrier for the no Mg versus magnesium is ≥ 11 kcal/mol, which is in quite good agreement with our umbrella sampling barrier differences (Fig. 4a). In response to this reviewer’s question, we added these points into the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors report the results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The main assertion of the paper is that a wide transition state ensemble is a key concept in enzyme catalysis as a strategy to circumvent entropic barriers. This assertion is based on the observation of a "structurally wide" set of energetically equivalent configurations that lie along the reaction coordinate in QM/MM simulations, together with kinetic measurements that suggest a decrease in the entropy of activation.

      We thank the reviewer for the endorsement, and very useful suggestions to improve the manuscript in an revised manuscript. Thanks to the questions, we have edited our manuscript accordingly. All suggested additional simulations and analysis further support our original findings.

      Strengths:

      The study combines theoretical calculations and supporting experiments.

      Weaknesses:

      The role(s) of entropy in enzyme catalysis has been discussed extensively in the literature, from the Circe effect proposed by Jencks and many other works. The current paper hypothesizes a "wide" transition state ensemble as a catalytic strategy and key concept in enzyme catalysis. Overall, it is not clear the degree to which this hypothesis is supported by the data. The reasons are as follows:

      (1) Enzyme catalysis reflects a rate enhancement with respect to a baseline reaction in solution. In order to assert that something is part of a catalytic strategy of an enzyme, it would be necessary to demonstrate from simulations that the activation entropy for the baseline reaction is indeed greater and the transition state ensemble less "wide". Alternatively stated, when indicating there is a "wide transition state ensemble" for the enzyme system - one needs to indicate that is with respect to the non-enzymatic reaction. However, these simulations were not performed and the comparisons were not demonstrated.

      We agree with the reviewer, that the ideal comparison to address enzyme catalytic power is to compare with the baseline reaction in solution. However, as is the case for many biological relevant reactions, in solution the reactions are too slow (i.e have too high barriers) and thus cannot be measured (this reaction would take about 7000 years without the enzyme). Moreover, in many cases, the reaction mechanism in solution is too different to that observed in the enzyme.

      To overcome this problem, another reference reaction is used instead of that in solution, such as a mutant enzyme, or the enzyme lacking a key cofactor, hence a non-optimized enzyme. In the present case, this baseline reaction corresponds to enzyme reaction in the absence of the Mg ion. Consistently, our results clearly show that the reaction without Mg which displays a larger barrier, has a narrower TS. We want to highlight that the extensive and excellent literature about QM/MM calculations of the hydrolysis of ATP hydrolysis in solution, which shows narrow transitions state ensembles, just to mention a few: Klähn, M., Rosta, E., & Warshel, A. (2006).

      On the mechanism of hydrolysis of phosphate monoesters dianions in solutions and proteins.

      Journal of the American Chemical Society, 128(47), 15310–15323. https://doi.org/10.1021/ja065470t; Wang, C., Huang, W., & Liao, J. lou. (2015). QM/MM investigation of ATP hydrolysis in aqueous solution. Journal of Physical Chemistry B, 119(9), 3720–3726. https://doi.org/10.1021/jp512960e.

      (2) The observation of a "wide conformational ensemble" is not a quantitative measure ofentropy. In order to make a meaningful computational prediction of the entropic contribution to the activation of free energy, one would need to perform free energy simulations over a range of temperatures (for the enzymatic and non-enzymatic systems). Such simulations were not performed, and the entropy of activation was thus not quantified by the computational predictions.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy.

      (3) The authors indicate that lid-opening, essential for product release, and not P-transfer is therate-limiting step in the catalytic cycle and Mg2+ accelerates both steps. How is it certain that the kinetic measurements are reporting on the chemical steps of the reaction, and not other factors such as metal ion binding or conformational changes?

      These questions were indeed the absolute critically ones we needed to answer early for studying how adenylate kinase is catalyzing the reaction by more than 14 orders of magnitude. This was done by a combination of pre-steady state, steady-state experiments combined with NMR dynamics, published in (Kerns et al., 2015), and described in the beginning of this manuscript in Fig. 1a. We agree with the reviewer that for many other enzymes such experimental examination of all microscopic steps for the enzymatic cycle had not been performed, leading to the risk of wrong interpretation of observed kinetic rates.

      (4) The authors explore different starting states for the chemical steps of the reaction (e.g.,different metal ion binding and protonation states), and conclude that the most reactive enzyme configuration is the one with the more favorable reaction-free energy barrier. However, it is not clear what is the probability of observing the system in these different states as a function of pH and metal ion concentration without performing appropriate pKa and metal ion binding calculations. This was not done, and hence these results seem somewhat inconclusive.

      As noted by the reviewer, in the present work our aim was to compare the chemical step of the reaction in different metal ion and protonation states. Our computational results show that the most reactive enzyme configuration is the nonprotonated state with Mg2+ in our forward reaction.

      We actually know what the probability of the metal-bound states are for this enzyme. The experimental data were described in (Kerns et al., 2015), we directly experimentally determined the concentration needed to fully occupy the Mg site with Mg or Ca, therefore no metal binding calculations are needed as the experiments are a direct measurement. From our x-ray structures we know the accurate binding site, and also see full occupancy. This is also true for the pH dependence of the chemical step, measured in this manuscript and shown in Fig. 5b. We note that the excellent agreement between our simulations and the experiments are one of the key features of the current manuscript.  As stated in the manuscript, we analyzed the pH dependence of the P-transfer step and showed that the rate increases with higher pH in the presence of Ca2+, while without a metal the opposite trend is observed. These results further support the QM/MM results showing that the fully-charged nucleotides state was the most reactive in the presence of the metal, whereas in the absence of the cation, only the monoprotonated nucleotides (low pH) were reactive.

      Reviewer #3 (Public Review):

      Summary:

      By conducting QM/MM free energy simulations, the authors aimed to characterize the mechanism and transition state for the phosphoryl transfer in adenylate kinase. The qualitative reliability of the QM/MM results has been supported by several interesting experimental kinetic studies. However, the interpretation of the QM/MM results is not well supported by the current calculations.

      Strengths:

      The QM/MM free energy simulations have been carefully conducted. The accuracy of the semiempirical QM/MM results was further supported by DFT/MM calculations, as well as qualitatively by several experimental studies.

      We thank the reviewer for the positive comments on the manuscript, particularly highlighting the support of the QM/MM results by additional DFT/MM calculations and several experiments.

      Weaknesses:

      (1) One key issue is the definition of the transition state ensemble. The authors appear to define this by simply considering structures that lie within a given free energy range from the barrier. However, this is not the rigorous definition of transition state ensemble, which should be defined in terms of committor distribution. This is not simply an issue of semantics, since only a rigorous definition allows a fair comparison between different cases - such as the transition state in an enzyme vs in solution, or with and without the metal ion. For a chemical reaction in a complex environment, it is also possible that many other variables (in addition to the breaking and forming P-O bonds) should be considered when one measures the diversity in the conformational ensemble.

      We thank the reviewer for noting this issue and for this great suggestion, as this led to a strengthening of the key findings in the revised manuscript version.  According to his/her suggestion, we performed a commitment analysis to properly define the TSE and compare the results between the enzyme in the presence/absence of Mg2+ (see new Fig. 4b).  The results further strengthen our previous finding and interpretation of a wider TSE for the reaction with Mg relative to without Mg.

      (2) While the experimental observation that the activation entropy differs significantly with and without the Ca2+ ion is interesting, it is difficult to connect this result with the "wide" transition state ensemble observed in the QM/MM simulations so far. Even without considering the definition of the transition state ensemble mentioned above, it is unlikely that a broader range of P-O distances would explain the substantial difference in the activation entropy measured in the experiment. Since the difference is sufficiently large, it should be possible to compute the value by repeating the free energy simulations at different temperatures, which would lead to a much more direct evaluation of the QM/MM model/result and the interpretation.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy. We believe that the additional committor calculations and the umbrella sampling (new Fig. 4a) are a strong support of our original findings, and better suited for supporting our findings as compared to repeating the free energy simulations at different temperatures.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Make sure consistent units are used, either kJ/mol or kcal/mol.

      Thanks, we made the changes.

      In the case of the mono-protonated simulation, where does the proton transfer between AD(T)P and AMP occur in both the forward and reverse reactions? It is worthwhile to note that the proton transfer may take place at different reaction coordinate values (between the two reactions), as it is not explicitly defined in the reaction coordinate. In this context, it is also necessary to discuss how to combine the results to generate a single free energy profile.

      We agree with the reviewer on this point. Accordingly, we have analyzed for the monoprotonated reaction when (or where in terms of RC) the proton transfer occurs in both forward and reverse reactions. The proton transfer occurs at -0.7 of the reaction coordinate (average value, figures 3-supplement 5 e and f).

      The methods section needs improvements:

      (1) Computational setup of the system: Were the systems neutralized? If so, what types of ions were used, and how many of them were included? If systems were not neutralized, discuss a potential artifact in the results. In addition, if the system for the reverse reaction (and no-Mg2+ systems) was prepared separately, provide details regarding their preparation.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the computational setup in the revised version.

      (2) Simulation parameters: Clarify how non-bonded interactions were treated in both MM and QM/MM simulations. For the QM/MM simulation, specify the time step used, whether the Shake was applied; whether the NPT simulations were performed, and any other relevant parameters.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the simulation parameters.

      (3) Free energy determination strategy: Describe how the two profiles (forward and reverse profiles) were combined and provide a theoretical justification for this approach. Additionally, include a comment on whether Jarzynski's inequality equation is directly applicable to the NPT simulation.

      According to the reviewer request, in the revised version of the manuscript we have described how the two profiles where combined and provided a theoretical justification for this approach.

      Reviewer #3 (Recommendations For The Authors):

      See recommendations in the Public Review regarding the analysis of transition state ensemble and activation entropy.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer #1:

      We thank the reviewer for the further recommendations for improving our presentation. We would like to carefully address the remaining concerns of the reviewer.

      (1) I realize now that I didn't make my point clear enough, which was that as far as I know there is no reason to believe that an oscillatory state cannot be induced with synaptic depression as with spike frequency adaptation when used in the context of the author's model. I'm fine with how the authors have distinguished their model from R&T 2015, but I think the more interesting question is whether there is any reason to believe that STD is not equally capable of doing all the things mentioned in this paper as SFA, and if not why not. I would like the authors to go out on a limb and address this, if only with a few sentences in the discussion. 

      Thank you for pointing this out again. In response to your query regarding the comparison between STD and SFA in generating bump sweeps, we have done simulations based on STD. The results showed that both STD and SFA are capable of inducing bi-directional sweeps. However, (based on our simulations) only SFA can produce uni-directional sweeps. The absence of uni-directional sweeps based on STD may be due to the subtle yet important differences between the two mechanisms. Specifically, STD modulates the neural activity by weakening the recurrent connections, which theoretically can only inhibit recurrent inputs, while SFA can attenuate all forms of excitatory inputs, including external inputs. However, since we did not exhaustively explore the entire parameter space, we cannot conclude that STD is incapable of producing uni-directional sweeps. Future simulations are required.

      According to the Reviewer’s suggestion, we added few sentences to discuss the distinctions between STD and SFA in generating theta sweeps in the CANN in line 432 to 440 in the Discussion session:

      “Based on our simulation, both STD and SFA show the ability to produce bi-directional sweeps within a CANN model, with the SFA uniquely enabling uni-directional sweeps in the absence of external theta inputs. This difference might be due to the lack of exhaustively exploration of the entire parameter space. However, it might also attribute to the subtle yet important theoretical distinctions between STD and SFA. Specifically, STD attenuates the neural activity through a reduction in recurrent connection strength, whereas SFA provides inhibitory input directly to the neurons, potentially impacting all excitatory inputs. These differences might explain the diverse dynamical behaviors observed in our simulations. Future experiments could clarify these distinctions by monitoring changes in synaptic strength and inhibitory channel activation during theta sweeps.”

      (2) I appreciate the inclusion of the experimental data in Fig 6a (though I don't find the left-most panel very useful). I also understand what the authors are trying to convey with plots in 6c and 6c. However, I don't find the text that was added above very helpful at all. I was hoping for a simpler demonstration of the effect, by plotting a series of sequential sweeps (cell index vs time, with color indicating firing rate, as in Fig 2d) in the case of both the slow speed and fast speed regimes. Here, vertical lines could mark the individual theta cycles and the firing of individual cells, showing the constancy of the former but change of the latter. 

      Thank you for your constructive feedback. It seems there might be a misunderstanding in our previous explanation, for which we apologize. The phenomenon we want to elucidate is not an increase in the theta frequency as detected in LFPs, but rather the slope of phase precession with respect to the animal's movement speed. Due to phase precession, the oscillations of place cells as the animal traverses the field is higher than the theta frequency. A plot as Fig 2.d will not make this point clearer, since it shows the baseline theta frequency (i.e., theta sweeps as we claimed previously). A straightforward way of thinking this point is as we added previously: “…The faster the animal runs, the faster the extra half cycle can be accomplished. Consequently, the firing frequency will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency”. We hope this clarification addresses the concerns raised.

      (3) This is still confusing to me. I just don't understand how the *phase* of the oscillating activity bump has anything to do with the movement of the animal. I would like to see a plot of the sweeps (again, cell index vs time, with color indicating the firing rate) before and after inactivation for short and long duration inactivation. Perhaps I am not understanding or appreciating how the bump recovers after inactivation and how this is related to the motion of the animal. 

      Thank you for pointing this out again. The activity bump will naturally pop out at the input location (which moves forward than before) after we remove the inactivation and then starts to sweep again as before the inactivation. Single cell phase precession and populational theta sweeps are actually the two sides of the same coin (if all cells start at roughly the same phase in theta cycles). If the reviewer accept this, then at the new location, the activity bump sweeps again (around the new location), and therefore phase precession starts again at a further phase, since phase codes the position as the animal traverses the place field.

      (4) I am glad the authors are spending more time discussing this phenomenon, but I am unsure of their explanation: for a sweep moving at constant speed, neurons all along the path will be equally affected (inhibited), so where does the bias for suppressing the "end" neurons come from? 

      While it may appear that neurons along the path are equally inhibited as the bump sweeps over them, our model incorporates external inputs with Gaussian profiles. These inputs bias neurons closer to the input location, resulting in fewer activations in neurons further away from the input position.

      (5) Here I was hoping that the authors might comment on what they suspect happens when the animal starts (or stops) moving, and how the network shifts from tracking regime to oscillatory regime (or vice versa), as is typically seen in experimental data (see for example, Kay et al., 2020, fig 4b,c). My apologies for not making this point clearer. 

      Thank you for pointing this out. In our model, we observed that when the animal stops, the network continues to generate theta oscillations near the input location, albeit with reduced amplitude (so the network dynamics looks like in the tracking regime). However, we hypothesize that when the animal pauses its movement for enough time (immobile but awake states), sensory input into the hippocampus also decreases, which is similar to removing external inputs in our model. In this case, the activity bump spontaneously moves away, resembling the phenomenon of replay (see also Romani & Tsodyks 2015).

      Regarding the experimental data (Kay et al.), it indeed appears that theta sweeps decoded from neural activity become less pronounced when the mouse moves at slower speeds. This observation could potentially correspond to a decrease in the amplitude of bump oscillations when external inputs associated with movement are halted but not entirely removed in our model. However, in experiments, when the mouse's movement slows down, hippocampal activity no longer oscillates at theta frequency, making it challenging to decode theta sweeps.

      We appreciate your clarification on this point and recognize the importance of further investigating how our model can accurately replicate the transition between tracking and oscillatory regimes observed in experimental data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Weaknesses:

      The readability could be improved.

      We have gone through the paper again and tried to revise the text to improve readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) Thank you for adding the discrimination ratio. However, as Fig 2 and 3 depict the same experimental data, consider harmonizing the presentation (symbols and colors) and consolidating the Figs for clarity.“

      This is an excellent point but it is actually very hard to harmonize symbols and colors because the data are divided in different ways. Upon considering this further, we actually don’t want to make the symbols and colors the same because it would be misleading. For example, WT and Tg training and testing session data are divided into grey and white throughout Figure 2, but in Figure 3, training and testing session data are pooled. To color code them grey and white in Figure 3 might make it seem that in Figure 3 training and testing were separated.

      (2) Fig 5 is missing

      We are not sure why Figure 5 was absent since it was present in our copy of the submitted pdf. We have double checked and in the revised manuscript we are sure Figure 5 is included.  

      (3) Fig 6 add raw data for WT

      We have added raw WT data. Revised figure 6 includes the raw data in part A4.

      (4) Fig 7 add raw data for WT

      We have added raw WT data. Revised Figure 7 includes the raw data in part A4.

    1. Author response:

      eLife assessment

      This potentially useful study involves neuro-imaging and electrophysiology in a small cohort of congenital cataract patients after sight recovery and age-matched control participants with normal sight. It aims to characterize the effects of early visual deprivation on excitatory and inhibitory balance in the visual cortex. While the findings are taken to suggest the existence of persistent alterations in Glx/GABA ratio and aperiodic EEG signals, the evidence supporting these claims is incomplete. Specifically, small sample sizes, lack of a specific control cohort, and other methodological limitations will likely restrict the usefulness of the work, with relevance limited to scientists working in this particular subfield.

      As pointed out in the public reviews, there are only very few human models which allow for assessing the role of early experience on neural circuit development. While the prevalent research in permanent congenital blindness reveals the response and adaptation of the developing brain to an atypical situation (blindness), research in sight restoration addresses the question of whether and how atypical development can be remediated if typical experience (vision) is restored. The literature on the role of visual experience in the development of E/I balance in humans, assessed via Magnetic Resonance Spectroscopy (MRS), has been limited to a few studies on congenital permanent blindness. Thus, we assessed sight recovery individuals with a history of congenital blindness, as limited evidence from other researchers indicated that the visual cortex E/I ratio might differ compared to normally sighted controls.

      Individuals with total bilateral congenital cataracts who remained untreated until later in life are extremely rare, particularly if only carefully diagnosed patients are included in a study sample. A sample size of 10 patients is, at the very least, typical of past studies in this population, even for exclusively behavioral assessments. In the present study, in addition to behavioral assessment as an indirect measure of sensitive periods, we investigated participants with two neuroimaging methods (Magnetic Resonance Spectroscopy and electroencephalography) to directly assess the neural correlates of sensitive periods in humans. The electroencephalography data allowed us to link the results of our small sample to findings documented in large cohorts of both, sight recovery individuals and permanently congenitally blind individuals. As pointed out in a recent editorial recommending an “exploration-then-estimation procedure,” (“Consideration of Sample Size in Neuroscience Studies,” 2020), exploratory studies like ours provide crucial direction and specific hypotheses for future work.

      We included an age-matched sighted control group recruited from the same community, measured in the same scanner and laboratory, to assess whether early experience is necessary for a typical excitatory/inhibitory (E/I) ratio to emerge in adulthood. The present findings indicate that this is indeed the case. Based on these results, a possible question to answer in future work, with individuals who had developmental cataracts, is whether later visual deprivation causes similar effects. Note that even if visual deprivation at a later stage in life caused similar effects, the current results would not be invalidated; by contrast, they are essential to understand future work on late (permanent or transient) blindness.

      Thus, we think that the present manuscript has far reaching implications for our understanding of the conditions under which E/I balance, a crucial characteristic of brain functioning, emerges in humans.

      Finally, our manuscript is one of the first few studies which relates MRS neurotransmitter concentrations to parameters of EEG aperiodic activity. Since present research has been using aperiodic activity as a correlate of the E/I ratio, and partially of higher cognitive functions, we think that our manuscript additionally contributes to a better understanding of what might be measured with aperiodic neurophysiological activity.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this human neuroimaging and electrophysiology study, the authors aimed to characterize the effects of a period of visual deprivation in the sensitive period on excitatory and inhibitory balance in the visual cortex. They attempted to do so by comparing neurochemistry conditions ('eyes open', 'eyes closed') and resting state, and visually evoked EEG activity between ten congenital cataract patients with recovered sight (CC), and ten age-matched control participants (SC) with normal sight.

      First, they used magnetic resonance spectroscopy to measure in vivo neurochemistry from two locations, the primary location of interest in the visual cortex, and a control location in the frontal cortex. Such voxels are used to provide a control for the spatial specificity of any effects because the single-voxel MRS method provides a single sampling location. Using MR-visible proxies of excitatory and inhibitory neurotransmission, Glx and GABA+ respectively, the authors report no group effects in GABA+ or Glx, no difference in the functional conditions 'eyes closed' and 'eyes open'. They found an effect of the group in the ratio of Glx/GABA+ and no similar effect in the control voxel location. They then performed multiple exploratory correlations between MRS measures and visual acuity, and reported a weak positive correlation between the 'eyes open' condition and visual acuity in CC participants.

      The same participants then took part in an EEG experiment. The authors selected only two electrodes placed in the visual cortex for analysis and reported a group difference in an EEG index of neural activity, the aperiodic intercept, as well as the aperiodic slope, considered a proxy for cortical inhibition. They report an exploratory correlation between the aperiodic intercept and Glx in one out of three EEG conditions.

      The authors report the difference in E/I ratio, and interpret the lower E/I ratio as representing an adaptation to visual deprivation, which would have initially caused a higher E/I ratio. Although intriguing, the strength of evidence in support of this view is not strong. Amongst the limitations are the low sample size, a critical control cohort that could provide evidence for a higher E/I ratio in CC patients without recovered sight for example, and lower data quality in the control voxel.

      Strengths of study:

      How sensitive period experience shapes the developing brain is an enduring and important question in neuroscience. This question has been particularly difficult to investigate in humans. The authors recruited a small number of sight-recovered participants with bilateral congenital cataracts to investigate the effect of sensitive period deprivation on the balance of excitation and inhibition in the visual brain using measures of brain chemistry and brain electrophysiology. The research is novel, and the paper was interesting and well-written.

      Limitations:

      (1.1) Low sample size. Ten for CC and ten for SC, and a further two SC participants were rejected due to a lack of frontal control voxel data. The sample size limits the statistical power of the dataset and increases the likelihood of effect inflation.

      Applying strict criteria, we only included individuals who were born with no patterned vision in the CC group. The population of individuals who have remained untreated past infancy is small in India, despite a higher prevalence of childhood cataract than Germany. Indeed, from the original 11 CC and 11 SC participants tested, one participant each from the CC and SC group had to be rejected, as their data had been corrupted, resulting in 10 participants in each group.

      It was a challenge to recruit participants from this rare group with no history of neurological diagnosis/intake of neuromodulatory medications, who were able and willing to undergo both MRS and EEG. For this study, data collection took more than 1.5 years.

      We took care of the validity of our results with two measures; first, assessed not just MRS, but additionally, EEG measures of E/I ratio. The latter allowed us to link results to a larger population of CC individuals, that is, we replicated the results of a larger group of 38 individuals (Ossandón et al., 2023) in our sub-group.

      Second, we included a control voxel. As predicted, all group effects were restricted to the occipital voxel.

      (1.2) Lack of specific control cohort. The control cohort has normal vision. The control cohort is not specific enough to distinguish between people with sight loss due to different causes and patients with congenital cataracts with co-morbidities. Further data from more specific populations, such as patients whose cataracts have not been removed, with developmental cataracts, or congenitally blind participants, would greatly improve the interpretability of the main finding. The lack of a more specific control cohort is a major caveat that limits a conclusive interpretation of the results.

      The existing work on visual deprivation and neurochemical changes, as assessed with MRS, has been limited to permanent congenital blindness. In fact, most of the studies on permanent blindness included only congenitally blind or early blind humans (Coullon et al., 2015; Weaver et al., 2013), or, in separate studies, only late-blind individuals (Bernabeu et al., 2009). Thus, accordingly, we started with the most “extreme” visual deprivation model, sight recovery after congenital blindness. If we had not observed any group difference compared to normally sighted controls, investigating other groups might have been trivial. Based on our results, subsequent studies in late blind individuals, and then individuals with developmental cataracts, can be planned with clear hypotheses.

      (1.3) MRS data quality differences. Data quality in the control voxel appears worse than in the visual cortex voxel. The frontal cortex MRS spectrum shows far broader linewidth than the visual cortex (Supplementary Figures). Compared to the visual voxel, the frontal cortex voxel has less defined Glx and GABA+ peaks; lower GABA+ and Glx concentrations, lower NAA SNR values; lower NAA concentrations. If the data quality is a lot worse in the FC, then small effects may not be detectable.

      Worse data quality in the frontal than the visual cortex has been repeatedly observed in the MRS literature, attributable to magnetic field distortions (Juchem & Graaf, 2017) resulting from the proximity of the region to the sinuses (recent example: (Rideaux et al., 2022)). Nevertheless, we chose the frontal control region rather than a parietal voxel, given the potential  neurochemical changes in multisensory regions of the parietal cortex due to blindness. Such reorganization would be less likely in frontal areas associated with higher cognitive functions. Further, prior MRS studies of the visual cortex have used the frontal cortex as a control region as well (Pitchaimuthu et al., 2017; Rideaux et al., 2022).

      In the present study, we checked that the frontal cortex datasets for Glx and GABA+ concentrations were of sufficient quality: the fit error was below 8.31% in both groups (Supplementary Material S3). For reference, Mikkelsen et al. reported a mean GABA+ fit error of 6.24 +/- 1.95% from a posterior cingulate cortex voxel across 8 GE scanners, using the Gannet pipeline. No absolute cutoffs have been proposed for fit errors. However, MRS studies in special populations (I/E ratio assessed in narcolepsy (Gao et al., 2024), GABA concentration assessed in Autism Spectrum Disorder (Maier et al., 2022)) have used frontal cortex data with a fit error of <10% to identify differences between cohorts (Gao et al., 2024; Pitchaimuthu et al., 2017). Based on the literature, MRS data from the frontal voxel of the present study would have been of sufficient quality to uncover group differences.

      In the revised manuscript, we will add the recently published MRS quality assessment form to the supplementary materials. Additionally, we would like to allude to our apriori prediction of group differences for the visual cortex, but not for the frontal cortex voxel.

      (1.4) Because of the direction of the difference in E/I, the authors interpret their findings as representing signatures of sight improvement after surgery without further evidence, either within the study or from the literature. However, the literature suggests that plasticity and visual deprivation drive the E/I index up rather than down. Decreasing GABA+ is thought to facilitate experience-dependent remodelling. What evidence is there that cortical inhibition increases in response to a visual cortex that is over-sensitised due to congenital cataracts? Without further experimental or literature support this interpretation remains very speculative.

      Indeed, higher inhibition was not predicted, which we attempt to reconcile in our discussion section. We base our discussion mainly on the non-human animal literature, which has shown evidence of homeostatic changes after prolonged visual deprivation in the adult brain (Barnes et al., 2015). It is also interesting to note that after monocular deprivation in adult humans, resting GABA+ levels decreased in the visual cortex (Lunghi et al., 2015). Assuming that after delayed sight restoration, adult neuroplasticity mechanisms must be employed, these studies would predict a “balancing” of the increased excitatory drive following sight restoration by a commensurate increase in inhibition (Keck et al., 2017). Additionally, the EEG results of the present study allowed for speculation regarding the underlying neural mechanisms of an altered E/I ratio. The aperiodic EEG activity suggested higher spontaneous spiking (increased intercept) and increased inhibition (steeper aperiodic slope between 1-20 Hz) in CC vs SC individuals (Ossandón et al., 2023).

      In the revised manuscript, we will more clearly indicate that these speculations are based primarily on non-human animal work, due to the lack of human studies on the subject.

      (1.5) Heterogeneity in the patient group. Congenital cataract (CC) patients experienced a variety of duration of visual impairment and were of different ages. They presented with co-morbidities (absorbed lens, strabismus, nystagmus). Strabismus has been associated with abnormalities in GABAergic inhibition in the visual cortex. The possible interactions with residual vision and confounds of co-morbidities are not experimentally controlled for in the correlations, and not discussed.

      The goal of the present study was to assess whether we would observe changes in E/I ratio after restoring vision at all. We would not have included patients without nystagmus in the CC group of the present study, since it would have been unlikely that they experienced congenital patterned visual deprivation. Amongst diagnosticians, nystagmus or strabismus might not be considered genuine “comorbidities” that emerge in people with congenital cataracts. Rather, these are consequences of congenital visual deprivation, which we employed as diagnostic criteria. Similarly, absorbed lenses are clear signs that cataracts were congenital. As in other models of experience dependent brain development (e.g. the extant literature on congenital permanent blindness, including anophthalmic individuals (Coullon et al., 2015; Weaver et al., 2013), some uncertainty remains regarding whether the (remaining, in our case) abnormalities of the eye, or the blindness they caused, are the factors driving neural changes. In case of people with reversed congenital cataracts, at least the retina is considered to be intact, as they would otherwise not receive cataract removal surgery.

      However, we consider it unlikely that strabismus caused the group differences, because the present study shows group differences in the Glx/GABA+ ratio at rest, regardless of eye opening or eye closure, for which strabismus would have caused distinct effects. By contrast, the link between GABA concentration and, for example, interocular suppression in strabismus, have so far been documented during visual stimulation (Mukerji et al., 2022; Sengpiel et al., 2006), and differed in direction depending on the amblyopic vs. non-amblyopic eye. Further, one MRS study did not find group differences in GABA concentration between the visual cortices of 16 amblyopic individuals and sighted controls (Mukerji et al., 2022), supporting that the differences in Glx/GABA+ concentration which we observed were driven by congenital deprivation, and not amblyopia-associated visual acuity or eye movement differences.  

      In the revised manuscript, we will discuss the inclusion criteria in more detail, and the aforementioned reasons why our data remains interpretable.

      (1.6) Multiple exploratory correlations were performed to relate MRS measures to visual acuity (shown in Supplementary Materials), and only specific ones were shown in the main document. The authors describe the analysis as exploratory in the 'Methods' section. Furthermore, the correlation between visual acuity and E/I metric is weak, and not corrected for multiple comparisons. The results should be presented as preliminary, as no strong conclusions can be made from them. They can provide a hypothesis to test in a future study.

      In the revised manuscript, we will clearly indicate that the exploratory correlation analyses are reported to put forth hypotheses for future studies.

      (1.7) P.16 Given the correlation of the aperiodic intercept with age ("Age negatively correlated with the aperiodic intercept across CC and SC individuals, that is, a flattening of the intercept was observed with age"), age needs to be controlled for in the correlation between neurochemistry and the aperiodic intercept. Glx has also been shown to negatively correlate with age.

      The correlation between chronological age and aperiodic intercept was observed across groups, but the correlation between Glx and the intercept of the aperiodic EEG activity was seen only in the CC group, even though the SC group was matched for age. Thus, such a correlation was very unlikely to  be predominantly driven by an effect of chronological age.

      In the revised manuscript, we will add the linear regressions with age as a covariate included below, for the relationship between aperiodic intercept and Glx concentration in the CC group. 

      a. A linear regression was conducted within the CC group to predict the intercept during visual stimulation, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.82_, t_(2,7)=16.1_, 𝑝=0.0024._ Note that the coefficient for age was not significant, 𝛽=0.007, t(7)=0.82, 𝑝=0.439. The regression coefficients and their respective statistics are presented in Author response table 1.

      Author response table 1.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Visual Stimulation) in the CC group

      b. A linear regression was conducted to predict the intercept during eye opening at rest, based on age and visual cortex Glx concentration. The results of the regression analysis indicated that the model explained a significant proportion of the variance in the aperiodic intercept, 𝑅2\=0.842_, t_(2,7)=18.6,  𝑝=0.00159_._ Note that the coefficient for age was not significant, 𝛽=−0.005, t(7)=−0.90, 𝑝=0.400. The regression coefficients and their respective statistics are presented in Author response table 2.

      Author response table 2.

      Regression Analysis Summary for Predicting Aperiodic Intercept (Eyes Open) in the CC group

      c. Given that the Glx coefficient is significant in both models and age does not significantly predict either outcome, it can be concluded that Glx independently predicts the intercept of the aperiodic intercept.

      (1.8) Multiple exploratory correlations were performed to relate MRS to EEG measures (shown in Supplementary Materials), and only specific ones were shown in the main document. Given the multiple measures from the MRS, the correlations with the EEG measures were exploratory, as stated in the text, p.16, and in Figure 4. Yet the introduction said that there was a prior hypothesis "We further hypothesized that neurotransmitter changes would relate to changes in the slope and intercept of the EEG aperiodic activity in the same subjects." It would be great if the text could be revised for consistency and the analysis described as exploratory.

      In the revised manuscript, we will improve the phrasing. We consider the correlation analyses as exploratory due to our sample size and the absence of prior work. However, we did hypothesize that both MRS and EEG markers would concurrently be altered in CC vs SC individuals.

      (1.9) The analysis for the EEG needs to take more advantage of the available data. As far as I understand, only two electrodes were used, yet far more were available as seen in their previous study (Ossandon et al., 2023). The spatial specificity is not established. The authors could use the frontal cortex electrode (FP1, FP2) signals as a control for spatial specificity in the group effects, or even better, all available electrodes and correct for multiple comparisons. Furthermore, they could use the aperiodic intercept vs Glx in SC to evaluate the specificity of the correlation to CC.

      The aperiodic intercept and slope did not differ between CC and SC individuals for Fp1 and Fp2, suggesting the spatial specificity of the results. In the revised manuscript, we will add this analysis to the supplementary material.

      Author response image 1.

      Aperiodic intercept (top) and slope (bottom) for congenital cataract-reversal (CC, red) and age-matched normally sighted control (SC, blue) individuals. Distributions of these parameters are displayed as violin plots for three conditions; at rest with eyes closed (EC), at rest with eyes open (EO) and during visual stimulation (LU). Aperiodic parameters were calculated across electrodes Fp1 and Fp2. Solid black lines indicate mean values, dotted black lines indicate median values. Coloured lines connect values of individual participants across conditions.

      Further, Glx concentration in the visual cortex did not correlate with the aperiodic intercept in the SC group (Figure 4), suggesting that this relationship was indeed specific to the CC group.

      The data from all electrodes has been analyzed and published in other studies as well (Pant et al., 2023; Ossandón et al., 2023).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript reports non-invasive measures of activity and neurochemical profiles of the visual cortex in congenitally blind patients who recovered vision through the surgical removal of bilateral dense cataracts. The declared aim of the study is to find out how restoring visual function after several months or years of complete blindness impacts the balance between excitation and inhibition in the visual cortex.

      Strengths:

      The findings are undoubtedly useful for the community, as they contribute towards characterising the many ways this special population differs from normally sighted individuals. The combination of MRS and EEG measures is a promising strategy to estimate a fundamental physiological parameter - the balance between excitation and inhibition in the visual cortex, which animal studies show to be heavily dependent upon early visual experience. Thus, the reported results pave the way for further studies, which may use a similar approach to evaluate more patients and control groups.

      Weaknesses:

      (2.1) The main issue is the lack of an appropriate comparison group or condition to delineate the effect of sight recovery (as opposed to the effect of congenital blindness). Few previous studies suggested an increased excitation/Inhibition ratio in the visual cortex of congenitally blind patients; the present study reports a decreased E/I ratio instead. The authors claim that this implies a change of E/I ratio following sight recovery. However, supporting this claim would require showing a shift of E/I after vs. before the sight-recovery surgery, or at least it would require comparing patients who did and did not undergo the sight-recovery surgery (as common in the field).

      Longitudinal studies would indeed be the best way to test the hypothesis that the lower E/I ratio in the CC group observed by the present study is a consequence of sight restoration. However, longitudinal studies involving neuroimaging are an effortful challenge, particularly in research conducted outside of major developed countries and dedicated neuroimaging research facilities. Crucially, however, had CC and SC individuals, as well as permanently congenitally blind vs SC individuals (Coullon et al., 2015; Weaver et al., 2013), not differed on any neurochemical markers, such a longitudinal study might have been trivial. Thus, in order to justify and better tailor longitudinal studies, cross-sectional studies are an initial step.

      (2.2) MR Spectroscopy shows a reduced GLX/GABA ratio in patients vs. sighted controls; however, this finding remains rather isolated, not corroborated by other observations. The difference between patients and controls only emerges for the GLX/GABA ratio, but there is no accompanying difference in either the GLX or the GABA concentrations. There is an attempt to relate the MRS data with acuity measurements and electrophysiological indices, but the explorative correlational analyses do not help to build a coherent picture. A bland correlation between GLX/GABA and visual impairment is reported, but this is specific to the patients' group (N=10) and would not hold across groups (the correlation is positive, predicting the lowest GLX/GABA ratio values for the sighted controls - the opposite of what is found). There is also a strong correlation between GLX concentrations and the EEG power at the lowest temporal frequencies. Although this relation is intriguing, it only holds for a very specific combination of parameters (of the many tested): only with eyes open, only in the patient group.

      We interpret these findings differently, that is, in the context of experiments from non-human animals and the larger MRS literature.

      Homeostatic control of E/I balance assumes that the ratio of excitation (reflected here by Glx) and inhibition (reflected here by GABA+) is regulated. Like prior work (Gao et al., 2024, 2024; Narayan et al., 2022; Perica et al., 2022; Steel et al., 2020; Takado et al., 2022; Takei et al., 2016), we assumed that the ratio of Glx/GABA+ is indicative of E/I balance rather than solely the individual neurotransmitter levels. One of the motivations for assessing the ratio vs the absolute concentration is that as per the underlying E/I balance hypothesis, a change in excitation would cause a concomitant change in inhibition, and vice versa, which has been shown in non-human animal work (Fang et al., 2021; Haider et al., 2006; Tao & Poo, 2005) and modeling research (Vreeswijk & Sompolinsky, 1996; Wu et al., 2022). Importantly, our interpretation of the lower E/I ratio is not just from the Glx/GABA+ ratio, but additionally, based on the steeper EEG aperiodic slope (1-20 Hz).  

      As in the discussion section and response 1.4, we did not expect to see a lower Glx/GABA+ ratio in CC individuals. We discuss the possible reasons for the direction of the correlation with visual acuity and aperiodic offset during passive visual stimulation, and offer interpretations and (testable) hypotheses.

      We interpret the direction of the  Glx/GABA+ correlation with visual acuity to imply that patients with highest (compensatory) balancing of the consequences of congenital blindness (hyperexcitation), in light of visual stimulation, are those who recover best. Note, the sighted control group was selected based on their “normal” vision. Thus, clinical visual acuity measures are not expected to sufficiently vary, nor have the resolution to show strong correlations with neurophysiological measures. By contrast, the CC group comprised patients highly varying in visual outcomes, and thus were ideal to investigate such correlations.

      This holds for the correlation between Glx and the aperiodic intercept, as well. Previous work has suggested that the intercept of the aperiodic activity is associated with broadband spiking activity in neural circuits (Manning et al., 2009). Thus, an atypical increase of spiking activity during visual stimulation, as indirectly suggested by “old” non-human primate work on visual deprivation (Hyvärinen et al., 1981) might drive a correlation not observed in healthy populations.

      In the revised manuscript, we will more clearly indicate in the discussion that these are possible post-hoc interpretations. We argue that given the lack of such studies in humans, it is all the more important that extant data be presented completely, even if the direction of the effects are not as expected.

      (2.3) For these reasons, the reported findings do not allow us to draw firm conclusions on the relation between EEG parameters and E/I ratio or on the impact of early (vs. late) visual experience on the excitation/inhibition ratio of the human visual cortex.

      Indeed, the correlations we have tested between the E/I ratio and EEG parameters were exploratory, and have been reported as such. The goal of our study was not to compare the effects of early vs. late visual experience. The goal was to study whether early visual experience is necessary for a typical E/I ratio in visual neural circuits. We provided clear evidence in favor of this hypothesis. Thus, the present results suggest the necessity of investigating the effects of late visual deprivation. In fact, such research is missing in permanent blindness as well.

      Reviewer #3 (Public Review):

      This manuscript examines the impact of congenital visual deprivation on the excitatory/inhibitory (E/I) ratio in the visual cortex using Magnetic Resonance Spectroscopy (MRS) and electroencephalography (EEG) in individuals whose sight was restored. Ten individuals with reversed congenital cataracts were compared to age-matched, normally sighted controls, assessing the cortical E/I balance and its interrelationship to visual acuity. The study reveals that the Glx/GABA ratio in the visual cortex and the intercept and aperiodic signal are significantly altered in those with a history of early visual deprivation, suggesting persistent neurophysiological changes despite visual restoration.

      My expertise is in EEG (particularly in the decomposition of periodic and aperiodic activity) and statistical methods. I have several major concerns in terms of methodological and statistical approaches along with the (over)interpretation of the results. These major concerns are detailed below.

      (3.1) Variability in visual deprivation:

      - The document states a large variability in the duration of visual deprivation (probably also the age at restoration), with significant implications for the sensitivity period's impact on visual circuit development. The variability and its potential effects on the outcomes need thorough exploration and discussion.

      We work with a rare, unique patient population, which makes it difficult to systematically assess the effects of different visual histories while maintaining stringent inclusion criteria such as complete patterned visual deprivation at birth. Regardless, we considered the large variance in age at surgery and time since surgery as supportive of our interpretation: group differences were found despite the large variance in duration of visual deprivation. Moreover, the existing variance was used to explore possible associations between behavior and neural measures, as well as neurochemical and EEG measures.

      In the revised manuscript, we will detail the advantages and disadvantages of our CC sample, with respect to duration of congenital visual deprivation.

      (3.2) Sample size:

      - The small sample size is a major concern as it may not provide sufficient power to detect subtle effects and/or overestimate significant effects, which then tend not to generalize to new data. One of the biggest drivers of the replication crisis in neuroscience.

      We address the small sample size in our discussion, and make clear that small sample sizes were due to the nature of investigations in special populations. It is worth noting that our EEG results fully align  with those of a larger sample of CC individuals (Ossandón et al., 2023), providing us confidence about their validity and reproducibility. Moreover, our MRS results and correlations of those with EEG parameters were spatially specific to occipital cortex measures, as predicted.

      The main problem with the correlation analyses between MRS and EEG measures is that the sample size is simply too small to conduct such an analysis. Moreover, it is unclear from the methods section that this analysis was only conducted in the patient group (which the reviewer assumed from the plots), and not explained why this was done only in the patient group. I would highly recommend removing these correlation analyses.

      We marked the correlation analyses as exploratory; note that we do not base most of our discussion on the results of these analyses. As indicated by Reviewer 1, reporting them allows for deriving more precise hypothesis for future studies. It has to be noted that we investigate an extremely rare population, tested outside of major developed economies and dedicated neuroimaging research facilities. In addition to being a rare patient group, these individuals come from poor communities. Therefore, we consider it justified to report these correlations as exploratory, providing direction for future research.

      (3.3) Statistical concerns:

      - The statistical analyses, particularly the correlations drawn from a small sample, may not provide reliable estimates (see https://www.sciencedirect.com/science/article/pii/S0092656613000858, which clearly describes this problem).

      It would undoubtedly be better to have a larger sample size. We nonetheless think it is of value to the research community to publish this dataset, since 10 multimodal data sets from a carefully diagnosed, rare population, representing a human model for the effects of early experience on brain development, are quite a lot.  Sample sizes in prior neuroimaging studies in transient blindness have most often ranged from n = 1 to n = 10. They nevertheless provided valuable direction for future research, and integration of results across multiple studies provides scientific insights.  

      Identifying possible group differences was the goal of our study, with the correlations being an exploratory analysis, which we have clearly indicated in the methods, results and discussion.

      - Statistical analyses for the MRS: The authors should consider some additional permutation statistics, which are more suitable for small sample sizes. The current statistical model (2x2) design ANOVA is not ideal for such small sample sizes. Moreover, it is unclear why the condition (EO & EC) was chosen as a predictor and not the brain region (visual & frontal) or neurochemicals. Finally, the authors did not provide any information on the alpha level nor any information on correction for multiple comparisons (in the methods section). Finally, even if the groups are matched w.r.t. age, the time between surgery and measurement, the duration of visual deprivation, (and sex?), these should be included as covariates as it has been shown that these are highly related to the measurements of interest (especially for the EEG measurements) and the age range of the current study is large.

      In our ANOVA models, the neurochemicals were the outcome variables, and the conditions were chosen as predictors based on prior work suggesting that Glx/GABA+ might vary with eye closure (Kurcyus et al., 2018). The study was designed based on a hypothesis of group differences localized to the occipital cortex, due to visual deprivation. The frontal cortex voxel was chosen to indicate whether these differences were spatially specific. Therefore, we conducted separate ANOVAs based on this study design.

      In the revised manuscript, we will add permutation analyses for our outcomes, as well as multiple regression models investigating whether the variance in visual history might have driven these results. Note that in the supplementary materials (S6, S7), we have reported the correlations between visual history metrics and MRS/EEG outcomes.

      The alpha level used for the ANOVA models specified in the methods section was 0.05. The alpha level for the exploratory analyses reported in the main manuscript was 0.008, after correcting for (6) multiple comparisons using the Bonferroni correction, also specified in the methods. Note that the p-values following correction are expressed as multiplied by 6, due to most readers assuming an alpha level of 0.05 (see response regarding large p-values).

      We used a control group matched for age and sex. Moreover, the controls were recruited and tested in the same institutes, using the same setup. We feel that we followed the gold standards for recruiting a healthy control group for a patient group.

      - EEG statistical analyses: The same critique as for the MRS statistical analyses applies to the EEG analysis. In addition: was the 2x3 ANOVA conducted for EO and EC independently? This seems to be inconsistent with the approach in the MRS analyses, in which the authors chose EO & EC as predictors in their 2x2 ANOVA.

      The 2x3 ANOVA was not conducted independently for the eyes open/eyes closed condition, the ANOVA conducted on the EEG metrics was 2x3 because it had group (CC, SC) and condition (eyes open (EO), eyes closed (EC) and visual stimulation (LU)) as predictors.

      - Figure 4: The authors report a p-value of >0.999 with a correlation coefficient of -0.42 with a sample size of 10 subjects. This can't be correct (it should be around: p = 0.22). All statistical analyses should be checked.

      As specified in the methods and figure legend, the reported p values in Figure 4 have been corrected using the Bonferroni correction, and therefore multiplied by the number of comparisons, leading to the seemingly large values.

      Additionally, to check all statistical analyses, we put the manuscript through an independent Statistics Check (Nuijten & Polanin, 2020) (https://michelenuijten.shinyapps.io/statcheck-web/) and will upload the consistency report with the revised supplementary material.

      - Figure 2c. Eyes closed condition: The highest score of the *Glx/GABA ratio seems to be ~3.6. In subplot 2a, there seem to be 3 subjects that show a Glx/GABA ratio score > 3.6. How can this be explained? There is also a discrepancy for the eyes-closed condition.

      The three subjects that show the Glx/GABA+ ratio > 3.6 in subplot 2a are in the SC group, whereas the correlations plotted in figure 2c are only for the CC group, where the highest score is indeed ~3.6.

      (3.4) Interpretation of aperiodic signal:

      - Several recent papers demonstrated that the aperiodic signal measured in EEG or ECoG is related to various important aspects such as age, skull thickness, electrode impedance, as well as cognition. Thus, currently, very little is known about the underlying effects which influence the aperiodic intercept and slope. The entire interpretation of the aperiodic slope as a proxy for E/I is based on a computational model and simulation (as described in the Gao et al. paper).

      Apart from the modeling work from Gao et al., multiple papers which have also been cited which used ECoG, EEG and MEG and showed concomitant changes in aperiodic activity with pharmacological manipulation of the E/I ratio (Colombo et al., 2019; Molina et al., 2020; Muthukumaraswamy & Liley, 2018). Further, several prior studies have interpreted changes in the aperiodic slope as reflective of changes in the E/I ratio, including studies of developmental groups (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Schaworonkow & Voytek, 2021) as well as patient groups (Molina et al., 2020; Ostlund et al., 2021).

      In the revised manuscript, we will cite those studies not already included in the introduction.

      - Especially the aperiodic intercept is a very sensitive measure to many influences (e.g. skull thickness, electrode impedance...). As crucial results (correlation aperiodic intercept and MRS measures) are facing this problem, this needs to be reevaluated. It is safer to make statements on the aperiodic slope than intercept. In theory, some of the potentially confounding measures are available to the authors (e.g. skull thickness can be computed from T1w images; electrode impedances are usually acquired alongside the EEG data) and could be therefore controlled.

      All electrophysiological measures indeed depend on parameters such as skull thickness and electrode impedance. As in the extant literature using neurophysiological measures to compare brain function between patient and control groups, we used a control group matched in age/ sex, recruited in the same region, tested with the same devices, and analyzed with the same analysis pipeline. For example, impedance was kept below 10 kOhm for all subjects. There is no evidence available suggesting that congenital cataracts are associated with changes in skull thickness that would cause the observed pattern of group results. Moreover, we cannot think of how any of the exploratory correlations between neurophysiological measures and MRS measures could be accounted for by a difference e.g. in skull thickness.

      - The authors wrote: "Higher frequencies (such as 20-40 Hz) have been predominantly associated with local circuit activity and feedforward signaling (Bastos et al., 2018; Van Kerkoerle et al., 2014); the increased 20-40 Hz slope may therefore signal increased spontaneous spiking activity in local networks. We speculate that the steeper slope of the aperiodic activity for the lower frequency range (1-20 Hz) in CC individuals reflects the concomitant increase in inhibition." The authors confuse the interpretation of periodic and aperiodic signals. This section refers to the interpretation of the periodic signal (higher frequencies). This interpretation cannot simply be translated to the aperiodic signal (slope).

      Prior work has not always separated the aperiodic and periodic components, making it unclear what might have driven these effects in our data. The interpretation of the higher frequency range was intended to contrast with the interpretations of lower frequency range, in order to speculate as to why the two aperiodic fits might go in differing directions. We will clarify our interpretation in the revised manuscript. Note that Ossandon et al. reported highly similar results (group differences for CC individuals and for permanently congenitally blind humans) for the aperiodic activity between 20-40 Hz and oscillatory activity in the gamma range. We will allude to these findings in the revised manuscript.

      - The authors further wrote: We used the slope of the aperiodic (1/f) component of the EEG spectrum as an estimate of E/I ratio (Gao et al., 2017; Medel et al., 2020; Muthukumaraswamy & Liley, 2018). This is a highly speculative interpretation with very little empirical evidence. These papers were conducted with ECoG data (mostly in animals) and mostly under anesthesia. Thus, these studies only allow an indirect interpretation by what the 1/f slope in EEG measurements is actually influenced.

      Note that Muthukumaraswamy et al. (2018) used different types of pharmacological manipulations and analyzed periodic and aperiodic MEG activity in addition to monkey ECoG (Medel et al., 2020) (now published as (Medel et al., 2023)) compared EEG activity in addition to ECoG data after propofol administration. The interpretation of our results are in line with a number of recent studies in developing (Hill et al., 2022; Schaworonkow & Voytek, 2021) and special populations using EEG. As mentioned above, several prior studies have used the slope of the 1/f component/aperiodic activity as an indirect measure of the E/I ratio (Favaro et al., 2023; Hill et al., 2022; McSweeney et al., 2023; Molina et al., 2020; Ostlund et al., 2021; Schaworonkow & Voytek, 2021), including studies using scalp-recorded EEG. We will make more clear in the introduction of the revised manuscript that this metric is indirect.

      While a full understanding of aperiodic activity needs to be provided, some convergent ideas have emerged . We think that our results contribute to this enterprise, since our study is, to the best of our knowledge, the first which assessed MRS measured neurotransmitter levels and EEG aperiodic activity.

      (3.5) Problems with EEG preprocessing and analysis:

      - It seems that the authors did not identify bad channels nor address the line noise issue (even a problem if a low pass filter of below-the-line noise was applied).

      As pointed out in the methods and Figure 1, we only analyzed data from two channels, O1 and O2, neither of which were rejected for any participant. Channel rejection was performed for the larger dataset, published elsewhere (Ossandón et al., 2023; Pant et al., 2023).

      In both published works, we did not consider frequency ranges above 40 Hz to avoid any possible contamination with line noise. Here, we focused on activity between 0 and 20 Hz, definitely excluding line noise contaminations. The low pass filter (FIR, 1-45 Hz) guaranteed that any spill-over effects of line noise would be restricted to frequencies just below the upper cutoff frequency.

      Additionally, a prior version of the analysis used the cleanline.m function to remove line noise before filtering, and the group differences remained stable. We will report this analysis in the supplementary version of the revised manuscript. Further, both groups were measured in the same lab, making line noise as an account for the observed group effects highly unlikely. Finally, any of the exploratory MRS-EEG correlations would be hard to explain if the EEG parameters would be contaminated with line noise.

      - What was the percentage of segments that needed to be rejected due to the 120μV criteria? This should be reported specifically for EO & EC and controls and patients.

      The mean percentage of 1 second segments rejected for each resting state condition is below. Mean percentage of 6.25 long segments rejected in each group for the visual stimulation condition are also included, and will be added to the revised manuscript:

      Author response table 3.

      - The authors downsampled the data to 60Hz to "to match the stimulation rate". What is the intention of this? Because the subsequent spectral analyses are conflated by this choice (see Nyquist theorem).

      This data were collected as part of a study designed to evoke alpha activity with visual white-noise, which ranged in luminance with equal power at all frequencies from 1-60 Hz, restricted by the refresh rate of the monitor on which stimuli were presented (Pant et al., 2023). This paradigm and method was developed by VanRullen and colleagues (Schwenk et al., 2020; Vanrullen & MacDonald, 2012), wherein the analysis requires the same sampling rate between the presented frequencies and the EEG data. The downsampling function used here automatically applies an anti-aliasing filter (EEGLAB 2019) .

      - "Subsequently, baseline removal was conducted by subtracting the mean activity across the length of an epoch from every data point." The actual baseline time segment should be specified.

      The time segment was the length of the epoch, that is, 1 second for the resting state conditions and 6.25 seconds for the visual stimulation conditions. This will be explicitly stated in the revised manuscript.

      - "We excluded the alpha range (8-14 Hz) for this fit to avoid biasing the results due to documented differences in alpha activity between CC and SC individuals (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023)." This does not really make sense, as the FOOOF algorithm first fits the 1/f slope, for which the alpha activity is not relevant.

      We did not use the FOOOF algorithm/toolbox in this manuscript. As stated in the methods, we used a 1/f fit to the 1-20 Hz spectrum in the log-log space, and subtracted this fit from the original spectrum to obtain the corrected spectrum. Given the pronounced difference in alpha power between groups (Bottari et al., 2016; Ossandón et al., 2023; Pant et al., 2023), we were concerned it might drive differences in the exponent values.  Our analysis pipeline had been adapted from previous publications of our group and other labs (Ossandón et al., 2023; Voytek et al., 2015; Waschke et al., 2017).

      We have conducted the analysis with and without the exclusion of the alpha range, as well as using the FOOOF toolbox both in the 1-20 Hz and 20-40 Hz ranges (Ossandón et al., 2023); The findings of a steeper slope in the 1-20 Hz range as well as lower alpha power in CC vs SC individuals remained stable. In Ossandón et al., the comparison between the piecewise fits and FOOOF fits led the authors to use the former as it outperformed the FOOOF algorithm for their data.

      - The model fits of the 1/f fitting for EO, EC, and both participant groups should be reported.

      In Figure 3 of the manuscript, we depicted the mean spectra and 1/f fits for each group. We will add the fit quality metrics and show individual subjects’ fits in the revised manuscript.

      (3.6) Validity of GABA measurements and results:

      - According the a newer study by the authors of the Gannet toolbox (https://analyticalsciencejournals.onlinelibrary.wiley.com/doi/abs/10.1002/nbm.5076), the reliability and reproducibility of the gamma-aminobutyric acid (GABA) measurement can vary significantly depending on acquisition and modeling parameter. Thus, did the author address these challenges?

      We took care of data quality while acquiring MRS data by ensuring appropriate voxel placement and linewidth prior to scanning. Acquisition as well as modeling parameters were constant for both groups, so they cannot have driven group differences.

      The linked article compares the reproducibility of GABA measurement using Osprey, which was released in 2020 and uses linear combination modeling to fit the peak as opposed to Gannet’s simple peak fitting (Hupfeld et al., 2024). The study finds better test-retest reliability for Osprey compared to Gannet’s method.

      As the present work was conceptualized in 2018, we used Gannet 3.0, which was the state-of-the-art edited spectral analysis toolbox at the time, and still is widely used. In the revised manuscript, we will include a supplementary section reanalyzing the main findings with Osprey.

      - Furthermore, the authors wrote: "We confirmed the within-subject stability of metabolite quantification by testing a subset of the sighted controls (n=6) 2-4 weeks apart. Looking at the supplementary Figure 5 (which would be rather plotted as ICC or Blant-Altman plots), the within-subject stability compared to between-subject variability seems not to be great. Furthermore, I don't think such a small sample size qualifies for a rigorous assessment of stability.

      Indeed, we did not intend to provide a rigorous assessment of within-subject stability. Rather, we aimed to confirm that data quality/concentration ratios did not systematically differ between the same subjects tested longitudinally; driven, for example, by scanner heating or time of day. As with the phantom testing, we attempted to give readers an idea of the quality of the data, as they were collected from a primarily clinical rather than a research site.

      In the revised manuscript we will remove the statement regarding stability, and add the Blant-Altman plot.

      - "Why might an enhanced inhibitory drive, as indicated by the lower Glx/GABA ratio" Is this interpretation really warranted, as the results of the group differences in the Glx/GABA ratio seem to be rather driven by a decreased Glx concentration in CC rather than an increased GABA (see Figure 2).

      We used the Glx/GABA+ ratio as a measure, rather than individual Glx or GABA+ concentration, which did not significantly differ between groups. As detailed in Response 2.2, we think this metric aligns better with an underlying E/I balance hypothesis and has been used in many previous studies (Gao et al., 2024; Liu et al., 2015; Narayan et al., 2022; Perica et al., 2022).

      Our interpretation of an enhanced inhibitory drive additionally comes from the combination of aperiodic EEG (1-20 Hz) and MRS measures, which, when considered together, are consistent with a decreased E/I ratio.

      In the revised manuscript, we will rephrase this sentence accordingly. 

      - Glx concentration predicted the aperiodic intercept in CC individuals' visual cortices during ambient and flickering visual stimulation. Why specifically investigate the Glx concentration, when the paper is about E/I ratio?

      As stated in the methods, we exploratorily assessed the relationship between all MRS parameters (Glx, GABA+ and Glx/GABA+ ratio) with the aperiodic parameters (slope, offset), and corrected for multiple comparisons accordingly. We think this is a worthwhile analysis considering the rarity of the dataset/population (see 1.2, 1.6, 2.1 and reviewer 1’s comments about future hypotheses). We only report the Glx – aperiodic intercept correlation in the main manuscript as it survived correction for multiple comparisons.

      (3.7) Interpretation of the correlation between MRS measurements and EEG aperiodic signal:

      - The authors wrote: "The intercept of the aperiodic activity was highly correlated with the Glx concentration during rest with eyes open and during flickering stimulation (also see Supplementary Material S11). Based on the assumption that the aperiodic intercept reflects broadband firing (Manning et al., 2009; Winawer et al., 2013), this suggests that the Glx concentration might be related to broadband firing in CC individuals during active and passive visual stimulation." These results should not be interpreted (or with very caution) for several reasons (see also problem with influences on aperiodic intercept and small sample size). This is a result of the exploratory analyses of correlating every EEG parameter with every MRS parameter. This requires well-powered replication before any interpretation can be provided. Furthermore and importantly: why should this be specifically only in CC patients, but not in the SC control group?

      We indicate clearly in all parts of the manuscript that these correlations are presented as exploratory. Further, we interpret the Glx-aperiodic offset correlation, and none of the others, as it survived the Bonferroni correction for multiple comparisons. We offer a hypothesis in the discussion section as to why such a correlation might exist in the CC but not the SC group (see response 2.2), and do not speculate further.

      (3.8) Language and presentation:

      - The manuscript requires language improvements and correction of numerous typos. Over-simplifications and unclear statements are present, which could mislead or confuse readers (see also interpretation of aperiodic signal).

      In the revision, we will check that speculations are clearly marked and typos are removed.

      - The authors state that "Together, the present results provide strong evidence for experience-dependent development of the E/I ratio in the human visual cortex, with consequences for behavior." The results of the study do not provide any strong evidence, because of the small sample size and exploratory analyses approach and not accounting for possible confounding factors.

      We disagree with this statement and allude to convergent evidence of both MRS and neurophysiological measures. The latter link to corresponding results observed in a larger sample of CC individuals (Ossandón et al., 2023).

      - "Our results imply a change in neurotransmitter concentrations as a consequence of *restoring* vision following congenital blindness." This is a speculative statement to infer a causal relationship on cross-sectional data.

      As mentioned under 2.1, we conducted a cross-sectional study which might justify future longitudinal work. In order to advance science, new testable hypotheses were put forward at the end of a manuscript.

      In the revised manuscript we will add “might imply” to better indicate the hypothetical character of this idea.

      - In the limitation section, the authors wrote: "The sample size of the present study is relatively high for the rare population , but undoubtedly, overall, rather small." This sentence should be rewritten, as the study is plein underpowered. The further justification "We nevertheless think that our results are valid. Our findings neurochemically (Glx and GABA+ concentration), and anatomically (visual cortex) specific. The MRS parameters varied with parameters of the aperiodic EEG activity and visual acuity. The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) (Ossandón et al., 2023), and effects of chronological age were as expected from the literature." These statements do not provide any validation or justification of small samples. Furthermore, the current data set is a subset of an earlier published paper by the same authors "The EEG data sets reported here were part of data published earlier (Ossandón et al., 2023; Pant et al., 2023)." Thus, the statement "The group differences for the EEG assessments corresponded to those of a larger sample of CC individuals (n=38) " is a circular argument and should be avoided.

      Our intention was not to justify having a small sample, but to justify why we think the results might be valid as they align with/replicate existing literature.

      In the revised manuscript, we will add a figure showing that the EEG results of the 10 subjects considered here correspond to those of the 28 other subjects of Ossandon et al. We will adapt the text accordingly, clearly stating that the pattern of EEG results of the ten subjects reported here replicate those of the 28 additional subjects of Ossandon et al. (2023).

      References

      Barnes, S. J., Sammons, R. P., Jacobsen, R. I., Mackie, J., Keller, G. B., & Keck, T. (2015). Subnetwork-specific homeostatic plasticity in mouse visual cortex in vivo. Neuron, 86(5), 1290–1303. https://doi.org/10.1016/J.NEURON.2015.05.010

      Bernabeu, A., Alfaro, A., García, M., & Fernández, E. (2009). Proton magnetic resonance spectroscopy (1H-MRS) reveals the presence of elevated myo-inositol in the occipital cortex of blind subjects. NeuroImage, 47(4), 1172–1176. https://doi.org/10.1016/j.neuroimage.2009.04.080

      Bottari, D., Troje, N. F., Ley, P., Hense, M., Kekunnaya, R., & Röder, B. (2016). Sight restoration after congenital blindness does not reinstate alpha oscillatory activity in humans. Scientific Reports. https://doi.org/10.1038/srep24683

      Colombo, M. A., Napolitani, M., Boly, M., Gosseries, O., Casarotto, S., Rosanova, M., Brichant, J. F., Boveroux, P., Rex, S., Laureys, S., Massimini, M., Chieregato, A., & Sarasso, S. (2019). The spectral exponent of the resting EEG indexes the presence of consciousness during unresponsiveness induced by propofol, xenon, and ketamine. NeuroImage, 189(September 2018), 631–644. https://doi.org/10.1016/j.neuroimage.2019.01.024

      Consideration of Sample Size in Neuroscience Studies. (2020). Journal of Neuroscience, 40(21), 4076–4077. https://doi.org/10.1523/JNEUROSCI.0866-20.2020

      Coullon, G. S. L., Emir, U. E., Fine, I., Watkins, K. E., & Bridge, H. (2015). Neurochemical changes in the pericalcarine cortex in congenital blindness attributable to bilateral anophthalmia. Journal of Neurophysiology. https://doi.org/10.1152/jn.00567.2015

      Fang, Q., Li, Y. T., Peng, B., Li, Z., Zhang, L. I., & Tao, H. W. (2021). Balanced enhancements of synaptic excitation and inhibition underlie developmental maturation of receptive fields in the mouse visual cortex. Journal of Neuroscience, 41(49), 10065–10079. https://doi.org/10.1523/JNEUROSCI.0442-21.2021

      Favaro, J., Colombo, M. A., Mikulan, E., Sartori, S., Nosadini, M., Pelizza, M. F., Rosanova, M., Sarasso, S., Massimini, M., & Toldo, I. (2023). The maturation of aperiodic EEG activity across development reveals a progressive differentiation of wakefulness from sleep. NeuroImage, 277. https://doi.org/10.1016/J.NEUROIMAGE.2023.120264

      Gao, Y., Liu, Y., Zhao, S., Liu, Y., Zhang, C., Hui, S., Mikkelsen, M., Edden, R. A. E., Meng, X., Yu, B., & Xiao, L. (2024). MRS study on the correlation between frontal GABA+/Glx ratio and abnormal cognitive function in medication-naive patients with narcolepsy. Sleep Medicine, 119, 1–8. https://doi.org/10.1016/j.sleep.2024.04.004

      Haider, B., Duque, A., Hasenstaub, A. R., & McCormick, D. A. (2006). Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. https://doi.org/10.1523/JNEUROSCI.5297-05.2006

      Hill, A. T., Clark, G. M., Bigelow, F. J., Lum, J. A. G., & Enticott, P. G. (2022). Periodic and aperiodic neural activity displays age-dependent changes across early-to-middle childhood. Developmental Cognitive Neuroscience, 54, 101076. https://doi.org/10.1016/J.DCN.2022.101076

      Hupfeld, K. E., Zöllner, H. J., Hui, S. C. N., Song, Y., Murali-Manohar, S., Yedavalli, V., Oeltzschner, G., Prisciandaro, J. J., & Edden, R. A. E. (2024). Impact of acquisition and modeling parameters on the test–retest reproducibility of edited GABA+. NMR in Biomedicine, 37(4), e5076. https://doi.org/10.1002/nbm.5076

      Hyvärinen, J., Carlson, S., & Hyvärinen, L. (1981). Early visual deprivation alters modality of neuronal responses in area 19 of monkey cortex. Neuroscience Letters, 26(3), 239–243. https://doi.org/10.1016/0304-3940(81)90139-7

      Juchem, C., & Graaf, R. A. de. (2017). B0 magnetic field homogeneity and shimming for in vivo magnetic resonance spectroscopy. Analytical Biochemistry, 529, 17–29. https://doi.org/10.1016/j.ab.2016.06.003

      Keck, T., Hübener, M., & Bonhoeffer, T. (2017). Interactions between synaptic homeostatic mechanisms: An attempt to reconcile BCM theory, synaptic scaling, and changing excitation/inhibition balance. Current Opinion in Neurobiology, 43, 87–93. https://doi.org/10.1016/J.CONB.2017.02.003

      Kurcyus, K., Annac, E., Hanning, N. M., Harris, A. D., Oeltzschner, G., Edden, R., & Riedl, V. (2018). Opposite Dynamics of GABA and Glutamate Levels in the Occipital Cortex during Visual Processing. Journal of Neuroscience, 38(46), 9967–9976. https://doi.org/10.1523/JNEUROSCI.1214-18.2018

      Liu, B., Wang, G., Gao, D., Gao, F., Zhao, B., Qiao, M., Yang, H., Yu, Y., Ren, F., Yang, P., Chen, W., & Rae, C. D. (2015). Alterations of GABA and glutamate-glutamine levels in premenstrual dysphoric disorder: A 3T proton magnetic resonance spectroscopy study. Psychiatry Research - Neuroimaging, 231(1), 64–70. https://doi.org/10.1016/J.PSCYCHRESNS.2014.10.020

      Lunghi, C., Berchicci, M., Morrone, M. C., & Russo, F. D. (2015). Short‐term monocular deprivation alters early components of visual evoked potentials. The Journal of Physiology, 593(19), 4361. https://doi.org/10.1113/JP270950

      Maier, S., Düppers, A. L., Runge, K., Dacko, M., Lange, T., Fangmeier, T., Riedel, A., Ebert, D., Endres, D., Domschke, K., Perlov, E., Nickel, K., & Tebartz van Elst, L. (2022). Increased prefrontal GABA concentrations in adults with autism spectrum disorders. Autism Research, 15(7), 1222–1236. https://doi.org/10.1002/aur.2740

      Manning, J. R., Jacobs, J., Fried, I., & Kahana, M. J. (2009). Broadband shifts in local field potential power spectra are correlated with single-neuron spiking in humans. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 29(43), 13613–13620. https://doi.org/10.1523/JNEUROSCI.2041-09.2009

      McSweeney, M., Morales, S., Valadez, E. A., Buzzell, G. A., Yoder, L., Fifer, W. P., Pini, N., Shuffrey, L. C., Elliott, A. J., Isler, J. R., & Fox, N. A. (2023). Age-related trends in aperiodic EEG activity and alpha oscillations during early- to middle-childhood. NeuroImage, 269, 119925. https://doi.org/10.1016/j.neuroimage.2023.119925

      Medel, V., Irani, M., Crossley, N., Ossandón, T., & Boncompte, G. (2023). Complexity and 1/f slope jointly reflect brain states. Scientific Reports, 13(1), 21700. https://doi.org/10.1038/s41598-023-47316-0

      Medel, V., Irani, M., Ossandón, T., & Boncompte, G. (2020). Complexity and 1/f slope jointly reflect cortical states across different E/I balances. bioRxiv, 2020.09.15.298497. https://doi.org/10.1101/2020.09.15.298497

      Molina, J. L., Voytek, B., Thomas, M. L., Joshi, Y. B., Bhakta, S. G., Talledo, J. A., Swerdlow, N. R., & Light, G. A. (2020). Memantine Effects on Electroencephalographic Measures of Putative Excitatory/Inhibitory Balance in Schizophrenia. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 5(6), 562–568. https://doi.org/10.1016/j.bpsc.2020.02.004

      Mukerji, A., Byrne, K. N., Yang, E., Levi, D. M., & Silver, M. A. (2022). Visual cortical γ−aminobutyric acid and perceptual suppression in amblyopia. Frontiers in Human Neuroscience, 16. https://doi.org/10.3389/fnhum.2022.949395

      Muthukumaraswamy, S. D., & Liley, D. T. (2018). 1/F electrophysiological spectra in resting and drug-induced states can be explained by the dynamics of multiple oscillatory relaxation processes. NeuroImage, 179(November 2017), 582–595. https://doi.org/10.1016/j.neuroimage.2018.06.068

      Narayan, G. A., Hill, K. R., Wengler, K., He, X., Wang, J., Yang, J., Parsey, R. V., & DeLorenzo, C. (2022). Does the change in glutamate to GABA ratio correlate with change in depression severity? A randomized, double-blind clinical trial. Molecular Psychiatry, 27(9), 3833—3841. https://doi.org/10.1038/s41380-022-01730-4

      Nuijten, M. B., & Polanin, J. R. (2020). “statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility of meta-analyses. Research Synthesis Methods, 11(5), 574–579. https://doi.org/10.1002/jrsm.1408

      Ossandón, J. P., Stange, L., Gudi-Mindermann, H., Rimmele, J. M., Sourav, S., Bottari, D., Kekunnaya, R., & Röder, B. (2023). The development of oscillatory and aperiodic resting state activity is linked to a sensitive period in humans. NeuroImage, 275, 120171. https://doi.org/10.1016/J.NEUROIMAGE.2023.120171

      Ostlund, B. D., Alperin, B. R., Drew, T., & Karalunas, S. L. (2021). Behavioral and cognitive correlates of the aperiodic (1/f-like) exponent of the EEG power spectrum in adolescents with and without ADHD. Developmental Cognitive Neuroscience, 48, 100931. https://doi.org/10.1016/j.dcn.2021.100931

      Pant, R., Ossandón, J., Stange, L., Shareef, I., Kekunnaya, R., & Röder, B. (2023). Stimulus-evoked and resting-state alpha oscillations show a linked dependence on patterned visual experience for development. NeuroImage: Clinical, 103375. https://doi.org/10.1016/J.NICL.2023.103375

      Perica, M. I., Calabro, F. J., Larsen, B., Foran, W., Yushmanov, V. E., Hetherington, H., Tervo-Clemmens, B., Moon, C.-H., & Luna, B. (2022). Development of frontal GABA and glutamate supports excitation/inhibition balance from adolescence into adulthood. Progress in Neurobiology, 219, 102370. https://doi.org/10.1016/j.pneurobio.2022.102370

      Pitchaimuthu, K., Wu, Q. Z., Carter, O., Nguyen, B. N., Ahn, S., Egan, G. F., & McKendrick, A. M. (2017). Occipital GABA levels in older adults and their relationship to visual perceptual suppression. Scientific Reports, 7(1). https://doi.org/10.1038/S41598-017-14577-5

      Rideaux, R., Ehrhardt, S. E., Wards, Y., Filmer, H. L., Jin, J., Deelchand, D. K., Marjańska, M., Mattingley, J. B., & Dux, P. E. (2022). On the relationship between GABA+ and glutamate across the brain. NeuroImage, 257, 119273. https://doi.org/10.1016/J.NEUROIMAGE.2022.119273

      Schaworonkow, N., & Voytek, B. (2021). Longitudinal changes in aperiodic and periodic activity in electrophysiological recordings in the first seven months of life. Developmental Cognitive Neuroscience, 47. https://doi.org/10.1016/j.dcn.2020.100895

      Schwenk, J. C. B., VanRullen, R., & Bremmer, F. (2020). Dynamics of Visual Perceptual Echoes Following Short-Term Visual Deprivation. Cerebral Cortex Communications, 1(1). https://doi.org/10.1093/TEXCOM/TGAA012

      Sengpiel, F., Jirmann, K.-U., Vorobyov, V., & Eysel, U. T. (2006). Strabismic Suppression Is Mediated by Inhibitory Interactions in the Primary Visual Cortex. Cerebral Cortex, 16(12), 1750–1758. https://doi.org/10.1093/cercor/bhj110

      Steel, A., Mikkelsen, M., Edden, R. A. E., & Robertson, C. E. (2020). Regional balance between glutamate+glutamine and GABA+ in the resting human brain. NeuroImage, 220. https://doi.org/10.1016/J.NEUROIMAGE.2020.117112

      Takado, Y., Takuwa, H., Sampei, K., Urushihata, T., Takahashi, M., Shimojo, M., Uchida, S., Nitta, N., Shibata, S., Nagashima, K., Ochi, Y., Ono, M., Maeda, J., Tomita, Y., Sahara, N., Near, J., Aoki, I., Shibata, K., & Higuchi, M. (2022). MRS-measured glutamate versus GABA reflects excitatory versus inhibitory neural activities in awake mice. Journal of Cerebral Blood Flow & Metabolism, 42(1), 197. https://doi.org/10.1177/0271678X211045449

      Takei, Y., Fujihara, K., Tagawa, M., Hironaga, N., Near, J., Kasagi, M., Takahashi, Y., Motegi, T., Suzuki, Y., Aoyama, Y., Sakurai, N., Yamaguchi, M., Tobimatsu, S., Ujita, K., Tsushima, Y., Narita, K., & Fukuda, M. (2016). The inhibition/excitation ratio related to task-induced oscillatory modulations during a working memory task: A multtimodal-imaging study using MEG and MRS. NeuroImage, 128, 302–315. https://doi.org/10.1016/J.NEUROIMAGE.2015.12.057

      Tao, H. W., & Poo, M. M. (2005). Activity-dependent matching of excitatory and inhibitory inputs during refinement of visual receptive fields. Neuron, 45(6), 829–836. https://doi.org/10.1016/J.NEURON.2005.01.046

      Vanrullen, R., & MacDonald, J. S. P. (2012). Perceptual echoes at 10 Hz in the human brain. Current Biology. https://doi.org/10.1016/j.cub.2012.03.050

      Voytek, B., Kramer, M. A., Case, J., Lepage, K. Q., Tempesta, Z. R., Knight, R. T., & Gazzaley, A. (2015). Age-related changes in 1/f neural electrophysiological noise. Journal of Neuroscience, 35(38). https://doi.org/10.1523/JNEUROSCI.2332-14.2015

      Vreeswijk, C. V., & Sompolinsky, H. (1996). Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science, 274(5293), 1724–1726. https://doi.org/10.1126/SCIENCE.274.5293.1724

      Waschke, L., Wöstmann, M., & Obleser, J. (2017). States and traits of neural irregularity in the age-varying human brain. Scientific Reports 2017 7:1, 7(1), 1–12. https://doi.org/10.1038/s41598-017-17766-4

      Weaver, K. E., Richards, T. L., Saenz, M., Petropoulos, H., & Fine, I. (2013). Neurochemical changes within human early blind occipital cortex. Neuroscience. https://doi.org/10.1016/j.neuroscience.2013.08.004

      Wu, Y. K., Miehl, C., & Gjorgjieva, J. (2022). Regulation of circuit organization and function through inhibitory synaptic plasticity. Trends in Neurosciences, 45(12), 884–898. https://doi.org/10.1016/J.TINS.2022.10.006

  2. www.researchsquare.com www.researchsquare.com
    1. Author response:

      We thank the editor and reviewers for the time invested in our manuscript and their valuable and insightful critiques. However, we believe that the results justified our conclusions in the manuscript well; therefore, we have decided not to revise it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I have one major concern regarding this draft of the manuscript:

      (1) In the manuscript (lines 130-31) it is stated that "About 55% (8/15) of mice with unilateral AAV-hM3Dq centered in the PMv showed an increase in LH release above 0.5ng/ml within 10-20 min following the CNO injection" However, data at time zero are not shown for 4 of the 8 "LH peak" animals. The missing data at time zero seems problematic for the analysis of the CNO-stimulated cohort. As mentioned in the manuscript, the area under the curve was calculated between the range of -10 to 20min post-injection. Because diestrus animals have spontaneous LH pulses, it is highly possible that an LH pulse is initiated in the10 minutes prior to drug delivery, as seen in the AAV-mCherry group in 1D, and similarly in 2C. Given the current form of analysis, it seems possible that a spontaneous LH pulse initiated anywhere up to 10 minutes prior to drug delivery could conceivably count as an experimentally induced "LH peak". Can you address this concern?

      We understand the reviewer’s concern about the spontaneous LH pulses. This is the reason we have been very strict on our analysis and have taken multiple approaches to analyze these data. In our hM3Dq group 55% of the animals responded to CNO with an increase in LH, while 0 responded in the negative control group. But also, in the clozapine group, where no time 0 points were missing, 100% of the animals with hM3Dq showed an LH increase after the injection while only 28% (2/7) showed the increase in the negative control group. Rigorously, the DREADDs approach doubled the chances of LH increase. Note that the spontaneous LH peaks observed in negative controls or during baseline show a very sharp increase and decrease at the next time point, while the 4 “PMv hits” without time 0 and increase in LH in the CNO-hM3Dq group showed a sustained rise after the 10 min or prolonged high LH levels (above 1ng/ml) even 30 min after the injection. But, ultimately, the cFOS levels in the PMv of CNO-hM3Dq group with increase in LH are significantly higher than in any other group and the number of cFOS neurons are highly correlated to LH levels. Another important aspect that should not be dismissed is that in this experimental design, we used unilateral injection in animals that are in a fed state, therefore the leptin role in rising LH levels is probably dampened.

      We have added a statement to clarify this issue.

      The following are minor concerns:

      a) Figure 4 a-d, it is clear that Vglut2 is absent in the VMH, but it seems more relevant to show this expression pattern in the PMv.

      We chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the conditional Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparse. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      b) Methods section, targeting PMv: please check the injection coordinate: "dura-mater [dorsoventral -0.54]"

      Thank you for noticing this mistake, all coordinates for the injection have now been corrected (-5.4 mm, ±0.5 and -5.4mm)

      Reviewer #2 (Recommendations For The Authors):

      This is a very well-written manuscript by Saenz de Meira and colleagues on a careful study reporting on the key role of glutamate transporter vGlut2 expression in the neurons of the ventral perimammillary nucleus (PMv) of the hypothalamus expressing the leptin receptor LepRb in energy homeostasis, puberty, and estrous cyclicity. The authors first show using cre-dependent chemogenetic viral tools that the selective activation of the PMv LepRb induces luteinizing hormone (LH) release. Then the authors demonstrate that the selective invalidation of vGlut2 in LepRb-expressing cells in the all body induces obesity and mild alteration of sexual maturation in both sexes and blunted estrous cyclicity in females. Finally, the authors knock out vGlut2 in PMv neurons in which they reintroduce LepRb expression in an otherwise LepRb-null background using an AAV Cre approach. This latter very elegant experiment shows that while the sole re-expression of LepRb in PMv neurons in LepRb-null mice was shown before to restore puberty onset, deleting vGlut2 in LepRb-expressing PMv neurons blunts this effect.

      My specific comments are as follows. Please note that none of them require additional experiments and that they can be answered by amending the text.

      (1) Please provide information on the serotypes and promoters of the AAVs used in the study to enhance reproducibility.

      Thank you, serotypes and promoters have been added for all AAVs.

      (2) Please reformulate lines 220-221. Indeed, this reviewer does not agree with the fact that balanopreputial separation (BPS) is a sign of puberty completion. BPS is merely a sign of the advancement of sexual maturation, akin to vaginal opening in females. In certain mouse strains, BPS coincides with mini puberty rather than puberty. The definitive sign of puberty completion involves the presence of spermatozoa in the vas deferens (equivalent to the first ovulation/first estrus in females).

      Thank you for this remark, this statement has now been modified.

      (3) The authors convincingly show that the potential contamination of the arcuate nucleus of the hypothalamus (ARH) with the AAV injections targeted to the PMv should not account for the DREADD-mediated activation of LH release. However, do the authors believe that DREADD activation of LepRb-expressing PMv neurons, inducing cFOS expression in these neurons, could also activate ARH kisspeptin neurons (which do not express LepRb) via transsynaptic action? Alternatively, do they posit direct activation of GnRH cell bodies in the preoptic region or GnRH axon/dendrites in the ARH/median eminence region?

      Thank you for this comment. We don’t have enough evidence from this DREADDs experiment to make a strong prediction on the downstream pathways. However, as discussed, from the DREADDs khrGFP females, we observed very few kisspeptin cells expressing cFOS, reducing the evidence for a PMv to ARH kisspeptin action in this case. With the evidence from our LepR-Cre;Vglut2flox animals that showed no alterations in kiss1 gene expression but a strong decrease in GnRH release, we hypothesize that this acute activation of LH is mediated by direct inputs from PMv to GnRH neurons, while acknowledging the possible existence of alternative pathways. These arguments have been added to the discussion. 

      (4) This reviewer finds it intriguing that glutamatergic signaling is required for LepRb re-expression in the PMv to restore fertility. Given that the authors and others have shown that PMv neurons heavily express NOS1, the activity of which is known to heavily rely on glutamatergic NMDAR activation, the authors may want to contextualize their results in light of the recent study showing that NOS1 is found to be a new causative gene in people with congenital hypogonadotropic hypogonadism.

      Thank you for the advice, we have added a paragraph discussing the possible involvement of nNos from PMv neurons in the discussion.

      (5) Does the absence of vGlut2 have any impact on the obesity phenotype in mice where LepRb is selectively re-expressed in the PMv?

      We have followed the weight of these animals after the AAV injections. However, due to the difficulty of generating dual homozygous (LepRnull homozygous are infertile) and producing adequate stereotaxic injections with minimum contamination of adjacent nuclei, the groups could not be run all together and thus, we refrained from performing comparative analysis of energy balance. Analysis of body weight in LepRnull mice with reactivation of LepR in PMv neurons have been published before (Donato et al., 2011 using the Flp/Frt model and Mahany et al., 2018 using the Cre/loxP system). No difference in body weight was observed in both studies. Below is the progression of body weight in mice with reactivation of LepR and deletion of Vglut2 in PMv neurons. We added a comment on this regard.

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      The authors examined the effects of glutamate release from PMv LepR neurons in the regulation of puberty and reproduction in female mice. Multiple genetic mouse models were utilized to either manipulate PMv LepR neuron activities, or to delete glutamate vesicle transporters from LepR neurons. The authors have been quite rigorous in validating these models and exploring potential contaminations. Most of the data presented are solid and convincing, and support the conclusion. This reviewer has the following suggestions for the authors to further improve this work and the manuscript.

      (1) The DREADD study had some issues. For example, "2 out of 7 control mice with no AAV showed an increase in LH...", indicating that LH increase may just happen randomly. More importantly, 45% of PMv-hit mice did not show LH response to CNO, making it hard to interpret the positive LH responses from the other 55% PMv-hit mice undergoing the same treatment. Overall, there are just too many variabilities in these DREADD data for anyone to come up with a clean and convincing conclusion. This reviewer suggests repeating these experiments or removing the DREADD data altogether. After all, the rest of the results are much more convincing and stand alone to support the role of glutamate release from these PMv LepR neurons.

      We appreciate the reviewer’s concern. Indeed, LH shows spontaneous pulsatility which is one of the biggest challenges in our field. We have answered this concern for Reviewer 1 above and modified the text accordingly. We decided to keep the data in the publication because we believe that this is very important evidence supporting our observations since this is the only experiment that approaches the role of the PMv in a free-moving, ad libitum fed mouse model that is not deficient for leptin signaling or glutamatergic neurotransmission. Altogether this paper strongly supports a role for glutamate signaling on leptin’s action in reproductive function. Evidence for this role were dismissive or contentious until now.

      (2) The mCherry signals in Figure 3 are of low quality and do not look like cell bodies.

      We have now equally increased the contrast and brightness in all higher magnification images of mCherry neurons (Fig 3F, G, I and J) to improve their visibility. The lower magnification images are high quality images of areas with high density of mCherry positive neurons. Thick section (30µm) at low magnification compromises the focus at different Z-axis levels. We feel that images 3E and 3H are important to define the location of cells in the arcuate nucleus. Colocalization and mCherry expression are clear in high magnification images.

      (3) The validation of Vglut2 deletion in LepR neurons (Fig. 4A-D) is very nice and convincing, but the images are from the VMH region. Why not show the PMv region?

      As mentioned to Reviewer 1, we chose the VMH because it has a very dense collection of either LeprCre;VGlut2 or Vglut2 only cells and it illustrates very well the Vglut2 deletion at small and high magnifications. In the PMv, however, the distribution of these cells is sparce. The reviewer is correct that for the current study, the PMv is more relevant and therefore, we have included images of the PMv showing a control and a LeprCre-Vglut2floxed animal in higher magnification.

      (4) Figures 4-5 used LepR-Cre as controls, while Figure 6 used Vglut2flox as controls. Why? Also, how did the authors set up the breedings to generate "littermates" in each of these studies?

      We used the LepR-Cre as controls for our experiments since we need Cre homozygous for proper Cre expression and we had the LepR-Cre homozygous colony from the DREADDs experiment. Also, these mice had previously been thoroughly evaluated and no metabolic and/or reproductive disruption were noticed (please, see lines 213-214 of the original submission). However, our LepR-Cre colony had to be drastically reduced during COVID and suffered from unexpected Δ recombination leading to loss of Vglut2 homozygotes. To overcome these issues, we used VGlut2-floxed controls for the gene expression and GnRH immunoreactivity experiments. These mice had previously been used as controls for metabolic experiments with the LepCre-Vglut2fl genotype (Xu et al., 2013 Mol Metab), showing no deficiencies in the metabolic phenotype.

      As described in the methods section (lines 464-466 of the original preprint), to inactivate glutamate in leptin responsive cells, LepRb-Cre mice were crossed with mice carrying loxP-modified Vglut2 alleles. Our experimental mice were homozygous for the LepRb-Cre allele (LepRb_cre/cre_) and homozygous for the Vglut2-loxP allele (Vglut2_fl/fl_). Our controls consisted of mice homozygous for the Cre allele (LepRb_cre/cre_;Vglut2_+/+, named LepRb-Cre) or homozygous for the Vglut2-loxP allele (LepRb+/+;Vglut2_fl/fl, named Vglut2_flox_). Both experimental (LepRb_cre/cre_;Vglut2_fl/fl_, named LepRbΔVglut2) and control mice were derived from the same litters with parents homozygous for one of the genes and heterozygous for the other gene (LepRb_cre/cre_;Vglut2_fl/+or LepRb_cre/+;Vglut2_fl/fl_). Mice were genotyped at weaning (21 days) and again at the end of the experiments.

      (5) The labeling of Figures 5E-F is missing, making it hard to read.

      We have confirmed that Figure 5E and F were mentioned in the figure legends and in the results text. To improve the analysis of the figure we have added the Y axis titles to Figure 5 C,D, E and F, previously only shown in Fig 5A and B.

      (6) The last experiment was very nice confirming the role of glutamate release from PMv LepR neurons. However, the key phenotypes (puberty development, pregnancy) were not graphed and only stated in the text.

      Thank you for your comment. Since the key result is that none the LeprLoxTb;Vglut2flox animals showed vaginal opening or pregnancy, we don’t feel the need to graph this. All the details of the reproductive and metabolic phenotyping of the Lepr-loxTB with re-expression of LepR in the PMV were described in Mahany et al., 2018.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis to prevent colitis in a mouse model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The work will be of interest to infectious disease researchers and those studying the microbiome.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      An extensive study on the probiotic properties of the Bacillus velezensis strain HBXN2020.

      Response: Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      - The main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Response: Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback.   

      - Most of the results and analysis parts are separated without a link or any story-telling to deliver a concise message.

      Response: Thank you for your comments and suggestions on our manuscript. The comments improve the quality and depth of manuscript. Based on your suggestions, we have revised modifications to the entire manuscript.

      The updated contents were presented in the revised manuscript.

      - For the Salmonella Typhimurium-induced mouse model of colitis, it is not clear how an oral infection of C57BL/6 would lead to colitis. Streptomycin is always pretreated (https://link.springer.com/protocol/10.1007/978-1-0716-1971-1_17).

      Response: Thank you very much for your reading and comments our manuscript. The S. Typhimurium ATCC14028 (STm) used in this study is a highly virulent strain. The findings of the predimed trial indicated that mice infected with 107 CFU STm exhibited notable symptoms in the absence of streptomycin pretreatment. Hence, streptomycin was not utilized as a pretreatment for mice in this study. We appreciate your review and feedback and hope that our response adequately addresses your concerns.  

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have the potential benefit of serving as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability of HBXN2020 to inhibit the growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose-dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance of colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate the effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting them with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Response: Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Response: Thanks for the comments and the positive reception of the manuscript.

      (3) The mouse experiments are very convincing.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there is no investigation of the mechanism that underpins this.

      Response: Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) The mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores with the current gold standard for treatment.

      Response: We gratefully appreciate for your valuable comments. The objective of this study is to investigate the potential of B. velezensis spores in mitigating bacterial-induced colitis. In this experiment, animal experimental design referred to the method described in previous studies with slight modifications (10.1038/s41467-019-13727-9, 10.1126/scitranslmed.abf4692). We appreciate your review and feedback. We hope that our response adequately addresses your concerns.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation. Overall, the manuscript is of potential interest to readers.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      There are few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Response: Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      It is quite wordy, without a clear emphasis on the major point of the study. It is obvious how the host-probiotic-microbiota behaves and why it works out well, which is the key part.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have modified this in the revised manuscript as suggested.

      The updated contents were presented in line 30-32, 34-39 and 41-46 in abstract section of the revised manuscript.

      Please remove "novel", Many previous works have already documented the probiotic Bacillus velezensis. It is also NOT novel species...

      Response: Thank you for your suggestion. We have corrected it as suggested. Please see line 26 in abstract section of the revised manuscript.

      Lines 44-46. The way this conclusion is delivered is inappropriate; it should be clarified exactly according to the supported results.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 44-46 in abstract section of the revised manuscript.

      Introduction:

      Lines 71-71, Lines 75-77, Line 92 "the homeostasis of", please remove.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 96 in introduction section of the revised manuscript.

      Are the Salmonella loads the key indicator for this model?

      Response: We gratefully appreciate for your valuable comments. In this study, we aimed to evaluate whether B. velezensis can alleviate S. Typhimurium-induced colitis in mice. It has been reported that S. Typhimurium enters the intestine, colonizes and proliferates in the intestinal epithelium, and then breaks through the intestinal barrier to reach the whole body with the blood circulation system, leading to systemic infection. Thereby, the load of Salmonella in the intestine and tissue organs is also one of the key indicators reflecting Salmonella infection. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The introduction should really focus on the knowledge gap in general and in a specific field, which is not available in the current version.

      Response: Thank you for your valuable suggestion. The comments improve the depth of the manuscript. We have corrected it as suggested.

      The updated contents were presented in line 53-57, 61-64, 69-75, 85-88 and 97-100 in introduction section of the revised manuscript.

      Results:

      "Genomic Characteristics" of B. velezensis HBXN2020 are separated. There are no links between this work for safety and probiotic effects.

      Response: Thank you for your suggestion. Based on your suggestion, we have revised modifications to the "genomic characteristics" in the results section. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Are the AMR and virulent genes available on the chromosome? Is there any gene cluster that codes useful stuff that is linked to probiotic efficacy in vitro and in vivo?

      Response:  Thanks for your suggestion. The comments improve the quality and depth of manuscript. In this study, the HBXN2020 genome contains fragments of AMR and virulence genes. However, the results of antibiotic sensitivity test and safety test showed that HBXN2020 did not exhibit resistance and toxicity. Furthermore, the HBXN2020 genome contains 13 different clusters of secondary metabolic synthesis genes. such as surfactin (genomic position: 323,509), macrolactin H (genomic position: 1,384,185), bacillaene (genomic position: 1,691,549), fengycin (genomic position: 1,865,856), difficidin (genomic position: 2,270,091), bacillibactin (genomic position: 3,000,977) and Bacilysin (genomic position: 3,589,078) (Table S2). These secondary metabolites have been shown to have varying degrees of inhibition on fungi (10.3390/foods11020140), Gram-positive pathogens (10.1371/journal.pone.0251514) and Gram-negative pathogens (10.1007/s00253-017-8095-x). We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 108-110 in results section of the revised manuscript and supplementary Table 2 in the revised supplemental material.

      Finally, the raw data (Illumina, Pacbio) should also be provided.

      Response: Thanks for pointing this out. According to your suggestion, we have submitted the raw data of the HBXN2020 genome to the GenBank database, GenBank accession number CP119399.1. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      Lines 100-108, please replace this part for a more meaningful investigation that could be possibly supported by the following experimental assays.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we try our best to remove some minor results and supplement more meaningful research findings. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Lines 119-126, which are not important, did you further check what or which parts make the bacteriostasis?

      Response: Thanks for pointing this out. According to your suggestion, we try our best to remove some minor results by removing unnecessary words and sentences. Furthermore, in the following research, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.   

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      "Biosafety"? Is there a standard way to conduct this investigation? please clarify.

      Response: Thank you for pointing out this problem in manuscript. In this experiment, Biosafety assessment of B. velezensis HBXN2020 referred to the method described by Zhou et al. with slight modifications (10.1038/s41467-022-31171-0). We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 651-652 in results section of the revised manuscript.

      Why are spores used, not whole bacteria? Please clarify.

      Response: Thanks for pointing this out. We apologize for any incomprehension caused by the use of B. velezensis HBXN2020 spores in manuscript. In this study, mice were treated with B. velezensis by oral gavage, while gastric acid will drastically reduce the activity of B. velezensis. However, spores tolerated strong acidic environments well. Additionally, previous studies have also precedents of using spores (10.1126/scitranslmed.abf4692). Thank you for your comments and feedback and hope that our response adequately addresses your concerns.

      Line 196, line 287, repeated assays were conducted, but the logical link is missing.

      Response: We gratefully appreciate for your valuable comments. We apologize for any inconvenience caused by the organization and coherence of our results section. According to your suggestion, we try our best to improve the manuscript's layout by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 195-198, 246-248, 256-257 and 285-287 in results section of the revised manuscript.

      Discussion:

      Please shorten it; it is wordy but without focus.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we try our best to shorten the discussion length by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 353-355, 358-360, 366-371, 381-385, 395-401, 417-419, 430-438, 459-466, 478-481 and 484-485 in discussion section of the revised manuscript.

      Conclusion:

      Please clarify and rework it.

      Response: Thanks for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have now rewritten the conclusion.

      The updated contents were presented in line 492-496 in conclusion section of the revised manuscript.

      Materials and Methods:

      Much more detailed information should be provided.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised detailed modifications to the experimental method. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 513-515, 530-533 and Supplementary Table 5 in revised manuscript and supplemental material.

      All previous bacterial sampling and a list of results should be provided as the supplemental document.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript. In this study, we conducted preliminary biological activity testing on 362 isolates of Bacillus against pathogenic bacteria, which included S. Typhimurium ATCC14028, E. coli ATCC35150, S. aureus ATCC43300 and ATCC29213. We found that the antagonistic activity of four strains of BacillusB. subtilis H1, B. velezensis HBXN2020, B. amyloliquefaciens 6-1 and B. licheniformis BSK14)against these pathogenic bacteria, while the rest have no significant activity. So we chose these four strains to further evaluate their antibacterial activity against Gram-negative and Gram-positive pathogens (Supplementary Table 5). Based on the antibacterial test results, we found that B. velezensis HBXN2020 strain had the best antibacterial activity. so we chose B. velezensis HBXN2020 for subsequent experiments. 

      The updated contents were presented in Supplementary Table 5 in supplemental material.

      Minor points:

      All bacterial genera and species should be italicized.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      Line 39, remove repeated "importantly"

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 39 in abstract section of the revised manuscript.

      Lines 55-56, please rewrite.

      Response: Thanks for your suggestion. We have now rephrased the sentence.  

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      The relevant references should be updated, in the right format.

      Response: Thanks for your suggestion. Based on your suggestion, we have revised modifications according to the literature format of eLife magazine.

      The updated contents were presented in reference section of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) In Figure 2, the authors make the argument that the increased survival of Bacillus spores at high temperatures and low pH renders the strain useful as a probiotic as it would survive in the gut. However, the gut temperature is not significantly higher than the rest of the body (certainly not 95 degrees). One assumes the pH argument applies to surviving in stomach acid so that spores can travel to the gut. These conclusions should be clarified/revised. The survival in bile salts gastric fluid etc makes more sense.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised these conclusions. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 129-132 in results section of the revised manuscript.

      (2) The overall differences in the microbiota on the stacked bar graphs are difficult to determine. In many cases, it looks like the HBXN2020 does not have a significant effect. The subsequent scattergrams are more convincing. Perhaps the authors can think of a better way to compare composite populations. If not, I suggest moving these stacked graphs to the supplementary information.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have moved stacked graphs to the supplemental material. In addition, we replaced bar graphs with heatmaps, the differences of microbial community composition among different experimental groups were evaluated using the depth of color. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see Figure 7and 10 in revised manuscript and supplemental material.

      Minor editorial:

      (1) Line 55 - "....antibiotic therapy is...".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) Line 60 - replace "emergent search" - poor syntax.

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.  

      The updated contents were presented in line 61-62 in introduction section of the revised manuscript.

      (3) Line 63 - "...play an important...".

      Response: Thanks for pointing this out. We have now rephrased the sentence.

      The updated contents were presented in line 63-64 in introduction section of the revised manuscript.

      (4) Figure 1C is not very useful, simply reinforces the data from 1A and 1B - this can be moved to the supplementary information.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript.

      Based on your suggestion, we have moved figure 1C to the supplemental material. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see figures in revised manuscript and supplemental material.

      (5) Line 126, "...that the growth of B. velezensis HBXN2020 was relatively stable." What do the authors mean by this? "Stable" implies no increase in biomass, but the growth curve does not indicate this, there was an increase in biomass after which, the culture appeared to reach a stationary phase. This should be clarified.

      Response: Thanks for pointing this out. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      (6) In Figure 5 - all the graphs in panel A can be amalgamated into one figure using different colours/symbols.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have merged all the graphics in panel A in Figure 5 into one figure.

      The updated contents were presented in Figure 5 in the revised manuscript.

      (7) The overall cohesiveness of the manuscript could be improved.

      Response: Thank you for your valuable comments. The comments improve the quality and depth of manuscript. We have revised the entire manuscript based on your suggestions. The updated contents were presented in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      There are some issues that following issues require clarification to improve the quality of the manuscript further.

      (1) L.55: Replace "antibiotic therapies" with "antibiotic therapy".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) "Bacillus" should be modified to italics in the manuscript (see e.g., L. 26, 65, 68, 109).

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      (3) The first appearance of bacterial names in the manuscript requires the full English name (see e.g., L. 158, 159, 160).

      Response: Thank you for pointing out this problem in manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 153-156 in results section of the revised manuscript.

      (4) L.166 and 167: "we evaluated its biological safety in a mouse model" suggest modifying to "we evaluated the biological safety of HBXN2020 in a mouse model".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 163-164 in results section of the revised manuscript.

      (5) L.229: Replace "suggest" with "suggested".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 226 in results section of the revised manuscript.

      (6) L.367: The tense of "can" should be consistent with "demonstrated".

      Response: Thanks for pointing this out. We have corrected this as suggested.

      (7) L.368 and L. 369: Replace "Gram positive and Gram negative" with "Gram-positive and Gram-negative".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      (8) L.372: Replace "and" with "as well as".

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 365 in discussion section of the revised manuscript.

      (9) NCBI accession number of supplementing 16SrRNA sequencing raw data.

      Response: Thank you for your suggestion. We have added it in the revised manuscript.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      (10) L. 1020 and L. 1073: It's recommended to reduce the word count in the annotations of Figures 5 and 8.

      Response: Thank you for your valuable suggestion. We have corrected it as suggested.

      The updated contents were presented in the annotations of Figure 5 and Figure 8 in figure legends section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Duan et al analyzed brain imaging data in UKBK and found a pattern in brain structure changes by aging. They identified two patterns and found links that can be differentiated by the categorization.

      Strengths:

      This discovery harbors a substantial impact on aging and brain structure and function.

      Weaknesses:

      (1) Therefore, the study requires more validation efforts. Most importantly, data underlying the stratification of the two groups are not obvious and lack further details. Can they also stratified by different methods? i.e. PCA?

      Response: Thanks for the comment. In this study, principal component analysis (PCA) was applied to individualized deviation of anatomic region of interest (ROI) for dimensionality reduction, which yielded the first 15 principal components explaining approximately 70% of the total variations for identifying longitudinal brain aging patterns. These two patterns can be stratified by both linear and non-linear dimensionality reduction methods: PCA and locally linear embedding (LLE)1. The grey matter volume (GMV) of 40 ROIs at baseline were linearly adjusted for sex, assessment center, handedness, ethnic, intracranial volume (ICV), and second-degree polynomial in age to be consistent with the whole-brain GMV trajectory model. There was a clear boundary between two patterns in the projected coordinate space, indicating distinct structural differences in brain aging between the two patterns (Author response image 1).

      Author response image 1.

      Stratification of the identified brain aging patterns using linear and non-linear dimensionality reduction methods. (a) The principal component space of PC1 and PC2, and (b) two-dimensional projected locally linear embedding space derived from brain volumetric measures. Points have been colored and shaped according to grouping labels of the brain aging patterns.

      (2) Are there any external data that can be used for validation?

      Response: Thanks for the comment. We were given access to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) study, which aimed at determining the relationships between clinical, cognitive, imaging, genetic, and biochemical biomarkers across the entire spectrum of Alzheimer’s disease. ADNI recruits participants aged between 55 and 90 years at 57 sites in the United States and Canada, who undergo a series of initial tests that are repeated at intervals over subsequent years. 

      Unfortunately, there are no appropriate and sufficient data, especially clinical, cognitive, and genetic data, to support unbiased validation of the heterogeneity in structural brain aging patterns. Only 890 (31.83%) of the 2796 subjects included in the ADNI were cognitively normal, of which 656 were included in the analyses after quality control of structural MRI and exclusion of missing covariate, with a mean age at the screen visit of 70.8 years (SD = 6.48 years), and 60.21% of the subjects were female. Thus, there are significant differences between ADNI and UK Biobank in terms of the population composition, with ADNI collecting more older subjects due to its focus on defining the progression of Alzheimer’s disease.

      Moreover, among 656 subjects with structural imaging data, the dataset used to validate the clinical, cognitive, and genetic manifestations of the brain aging patterns were missing to varying degrees. For example, blood biochemistry tests and telomere length data were missing at baseline by approximately 58% and 82% respectively, and genotype data were not assayed for more than 70 percent of the subjects. As for cognitive function tests, only the results of Mini-Mental State Examination were complete, while other tests such as the Trail Making Test and Digit Span Backward were available for less than 10 percent of subjects. 

      (3) Other previous discoveries or claims supporting the results of the study should be explored to support the conclusion.

      Response: Thanks for the suggestion. As we mentioned in the manuscript lines 274-277, participants with brain aging pattern 2 (lower baseline total GMV and more rapid GMV decrease) were characterized by accelerated biological aging and cognitive decline. Previous research on brainAGE2,3 (the difference between chronological age and the age predicted by the machine learning model of brain imaging data) showed that as a biomarker of accelerated brain aging, people with older brainAGE have accelerated biological aging and early signs of cognitive decline, which is consistent with our discoveries in this study (lines 302-306).

      Further, genome-wide association studies identified significant genetic loci contributing to accelerated brain aging, some of which can be found in pervious GWAS on image-derived phenotypes4, such as regional and tissue volume, cortical area and white matter tract measurements, and specific brain aging mode using a data-driven decomposition approach5 (lines 207-213).

      In addition, we demonstrated the “last in, first out” mirroring patterns between structural brain aging and brain development, and found that mirroring patterns are predominantly localized to the lateral / medial temporal cortex and the cingulate cortex, noted in the manuscript lines 231-234. Large differences in the patterns of change between adolescent late development and aging in the medial temporal cortex were previously found in studies of  brain development and aging patterns6 (lines 315-317).

      (4) Sex was merely used as a covariate. Were there sex differences during brain aging? What was the sex ratio difference in groups 1 and 2?

      Thanks for the comment. Sex differences during brain aging can be observed by investigating sex-stratified whole-brain GMV trajectories. We fitted the growth curve and estimated rate of change for total grey matter volume (TGMV) separately for male and female using generalized additive mixed effect models (GAMM), which included 40,921 observations from 17,055 males and 19,958 females (Author response image 2). Overall, among healthy participants aged 44-82 years in UK Biobank, males overall had higher total GMV and a faster rate of GMV decrease over time, while females had lower total GMV and a lower rate of GMV decrease. Similar conclusion can be found in normative brain-volume trajectories across the human lifespan7 . Supplementary Table 5 showed baseline and demographic characteristics for all participants and participants stratified by brain aging patterns. There were slightly more females than males among the total participants and for brain aging pattern 1 (53.4%) and pattern 2 (54.4%), and χ^2 tests showed no significant difference in the sex ratio between the two patterns (P = 0.06).

      Author response image 2.

      Total gray matter volume (TGMV) (a) and the estimated rate of change (b) for females (red) and males (blue). Rates of volumetric change for total gray matter and each ROI were estimated using GAMM, which incorporates both cross-sectional between-subject variation and longitudinal withinsubject variation from 22,067 observations for 19,958 females, and 18,854 observations for 17,055 males. Covariates include assessment center, handedness, ethnic, and ICV. Shaded areas around the fit line denotes 95% CI.

      (5) Although statistically significant, Figure 3 shows minimal differences. LTL and phenoAge are displayed in adjusted values but what are the actual values that differ between patterns 1 and 2?

      Response: Thanks for the comment. We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot. Associations between biological aging biomarkers and brain aging patterns were listed in Supplementary Table 6. Compared to brain aging pattern 1, participants in pattern 2 with more rapid GMV decrease had shorter leucocyte telomere

      length (P = 0.009, Cohen’s D = -0.028) and higher PhenoAge (P = 0.019, Cohen’s D = 0.027) without covariate adjustment. Specifically, participants in brain aging pattern 1 had average Z-standardized LTL 0.083 (SD 0.98) and average PhenoAge 41.35 years (SD 8.17 years), and those in pattern 2 had average Z-standardized LTL 0.055 (SD 0.97) and average PhenoAge 41.58 years (SD 8.32 years).

      (6) It is not intuitive to link gene expression results shown in Figure 8 and brain structure and functional differences between patterns 1 and 2. Any overlap of genes identified from analyses shown in Figure 6 (GWAS) and 8 (gene expression)?

      Response: Thanks for the comment. We apologize for the confusion. As we mentioned in the Result Section Gene expression profiles were associated with delayed brain development and accelerated brain aging, seventeen of the 45 genes mapped to GWAS significant SNP were found in Allen Human Brain Atlas (AHBA) dataset. Gene expression of LGR4 (rspearman = 0.56, Ppermutation = 2.5 × 10-4) were significantly associated with delayed brain development, and ESR1 (rspearman = 0.53, Ppermutation = 1.5 × 10-4) and FAM3C (rspearman = -0.37, Ppermutation = 0.004) were significantly associated with accelerated brain aging. BDNF-AS was positively associated with both delayed brain development and accelerated brain aging after spatial permutation test. Full association between gene expression profiles of mapped genes and estimated APC during brain development / aging were presented in Supplementary Tables 12 and 13, respectively.  

      Furthermore, we screened the genes based on their contributions and effect directions to the first PLS components in brain development and brain aging. We have found genes mapped to GWAS significant SNP among the genes screened for inclusion in the functional enrichment analysis (Author response table 1), with LGR4 (PLSw1(LGR4) = 3.70, P.FDR = 0.002) associated with delayed development and ESR1 (PLSw1(ESR1) = 3.91, P.FDR = 6.12 × 10-4) and FAM3C (PLSw1(FAM3C) = -3.68, P.FDR = 0.001) associated with accelerated aging.

      Author response table 1.

      Contributions and effect directions of the first PLS components in brain development and brain aging of genes that mapped to GWAS significant SNP. The bold P values reflect significance (P < 0.005, inclusion in the functional enrichment analysis) after FDR correction.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to understand the heterogeneity of brain aging by analyzing brain imaging data. Based on the concept of structural brain aging, they divided participants into two groups based on the volume and rate of decrease of gray matter volume (GMV). The group with rapid brain aging showed accelerated biological aging and cognitive decline and was found to be vulnerable to certain neuropsychiatric disorders. Furthermore, the authors claimed the existence of a "last in, first out" mirroring pattern between brain aging and brain development, which they argued is more pronounced in the group with rapid brain aging. Lastly, the authors identified genetic differences between the two groups and speculated that the cause of rapid brain aging may lie in genetic differences.

      Strengths:

      The authors supported their claims by analyzing a large amount of data using various statistical techniques. There seems to be no doubt about the quality and quantity of the data. Additionally, they demonstrated their strength in integrating diverse data through various analysis techniques to conclude.

      Weaknesses:

      There appears to be a lack of connection between the analysis results and their claims. Readers lacking sufficient background knowledge of the brain may find it difficult to understand the paper. It would be beneficial to modify the figures and writing to make the authors' claims clearer to readers. Furthermore, the paper gives an overall impression of being less polished in terms of abbreviations, figure numbering, etc. These aspects should be revised to make the paper easier for readers to understand.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Gray matter volume (GMV) is defined later in the manuscript and may confuse readers.

      Response: Thanks for the comment. We have now defined GMV upon its first appearance in the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In conducting GWAS, the authors used total GMV at the age of 60 as a phenotype (line 195). It would be beneficial to provide additional explanation as to why only the data from individuals aged 60 were utilized, especially considering the ample availability of GMV data.

      Response: Thanks for the comment and we apologize for the confusion. As we mentioned in the Methods Section Genome Wide Association Study to identify SNPs associated with brain aging patterns, we performed Genome-wide association studies (GWAS) on individual deviations of total GMV relative to the population average at 60 years using PLINK 2.0. Therefore, data from all individuals were used in the GWAS, rather than only those aged at 60y. To accomplish this, deviation of total GMV from the population average for each participant at age 60y was calculated using mixed effect regression model as described in the Methods Section Identification of longitudinal brain aging patterns.

      (2) Whole-brain gene expression data was linked to GMV (Line 237). Gray matter is known to account for about 40% of the total brain. Thus, interpreting whole-brain data in connection with GMV might introduce significant errors. Could this potential source of error be addressed?

      Response: Thanks for the comment. In our study, the Allen Human Brain Atlas (AHBA) dataset were processed using abagen toolbox version 0.1.3 (https://doi.org/10.5281/zenodo.5129257) with Desikan-Killiany atlas8, resulting in a matrix (83 regions × 15,633 gene expression levels) of transcriptional level values that contains brain structure of cortex and subcortex in bilateral hemispheres, and brainstem. Only data from 34 cerebral cortex regions, but not the whole brain, were included in the analysis of the association between regional change rate of gray matter volume and gene expression profiles using partial least squares (PLS) regression. We have clarified in the revised manuscript that we utilized AHBA microarray expression data from regions of interest (ROIs) in the cortex.

      (3) The paper lacks biological interpretation of the important genetic factors (SNPs and genes) for brain aging discovered in this study, as well as the results of gene ontology analysis. Many readers would be curious about the biological significance of these genetic differences and what kind of outcomes they may produce.

      Response: Thanks for the suggestion. As we mentioned in our manuscript, six independent single nucleotide polymorphisms (SNPs) were identified at genome-wide significance level (P < 5 ×1 0-8) (Fig. 6). Among them, two SNPs (rs10835187 and rs779233904) were also found to be associated with multiple brain imaging phenotypes in previous studies, such as regional and tissue volume, cortical area and white matter tract measurements. Compared to the GWAS using global gray matter volume as the phenotype, our GWAS revealed additional signal in chromosome 7 (rs7776725), which was mapped to the intron of FAM3C and encodes a secreted protein involved in pancreatic cancer and Alzheimer's disease. This signal was further validated to be associated with specific brain aging mode by another study using a data-driven decomposition approach. In addition, another significant locus (rs10835187, P = 1.11 ×1 0-13) is an intergenic variant between gene LGR4-AS1 and LIN7C, and was reported to be associated with bone density, and brain volume and total cortical area measurements. LIN7C encodes the Lin-7C protein, which is involved in the localization and stabilization of ion channels in polarized cells, such as neurons and epithelial cell. Previous study has revealed the association of both allelic and haplotypic variations in the LIN7C gene with ADHD. In addition, ESR1 was found to be involved in I-kappaB kinase/NF-kappaB signaling in the functional enrichment associated with accelerated brain aging (Figure 8 and Supplementary Figure 5), and its activation leads to a variety of human pathologies such as neurodegenerative, inflammatory, autoimmune and cancerous disease9. 

      In summary, the analyses from using the databases of GO biological processes and KEGG Pathways indicate synaptic transmission as an important process in the common mechanisms of brain development and aging, and cellular processes (autophagy), as well as the progression of neurodegenerative diseases, are important processes in the mechanisms of brain aging.

      (4) As mentioned in the public review, it would be helpful if figures were revised to more clearly represent the claims.

      (4.1) For Figure 1, it would be beneficial to explain how the authors analyzed the differences between the mentioned cross-section and longitudinal trajectory, which they identified as a strength of the study.

      Response: We have added the strengths of adopting longitudinal data for modeling brain aging trajectories compared to only using cross-sectional data in Figure 1 caption in the revised manuscript:

      “Fig. 1 Overview of the study workflow. a, Population cohorts (UK Biobank and IMAGEN) and data sources (brain imaging, biological aging biomarkers, cognitive functions, genomic data) involved in this study. b, Brain aging patterns were identified using longitudinal trajectories of the whole brain GMV, which enabled the capturing of long-term and individualized variations compared to only use cross-sectional data, and associations between brain aging patterns and other measurements (biological aging, cognitive functions and PRS of major neuropsychiatric disorders) were investigated. c, Mirroring patterns between brain aging and brain development was investigated using ztransformed brain volumetric change map and gene expression analysis.”

      (4.2) In Figure 3, it's challenging to distinguish differences between patterns 1 and 2 in LTL and PhenoAge. (e.g. It's unclear whether Pattern 1 is higher or lower). Clarifying this visually would be useful.

      Response: We have modified the visualization of Figure 3 in the revised manuscript by adjusting the appropriate axes for leucocyte telomere length (LTL) and PhenoAge variables and removing the whisker from the boxplot.

      Author response image 3.

      Distributions of biological aging biomarkers (leucocyte telomere length (LTL) and PhenoAge) among participants with brain aging patterns 1 and 2.

      (4.3) Figure 7 explains the mirroring pattern, but it's hard to discern significant differences from the figures alone (especially in Figures 7b and 7c). Using an alternative method (graph, etc.) to clearly represent this would be appreciated.

      Response: We have included an arrow pointing to the brain regions with significant differences in each subfigure.

      Author response image 4.

      The “last in, first out” mirroring patterns between brain development and brain aging.

      (5) Abbreviations should be explained when they are first introduced in the paper. For example, GMV continues to be used without explanation, and in line 203, it is written out as 'gray matter volume'. ADHD and ASD first appear at line 172, but the explanation is found in lines 177-178. Additionally, there are terms without explanations in the manuscript. For instance, BMI is not explained in the main manuscript but is defined in the Supplementary Information (Table S6).

      Response: We have corrected the inappropriate formatting regarding misplaced and missing abbreviations in the revised manuscript and Supplementary Information.

      (6) Figure numbers should follow the order of appearance in the paper. The first Supplementary Fig. in the manuscript is Supplementary Figure 3. It should be Supplementary Figure 1.

      Response: We have relabeled the figures with the order of appearance in the paper in the revised manuscript and Supplementary Information.

      Reference:

      (1) Roweis, S. T. & Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. science 290, 2323–2326 (2000).

      (2) Christman, S. et al. Accelerated brain aging predicts impaired cognitive performance and greater disability in geriatric but not midlife adult depression. Translational Psychiatry 10, 317 (2020).

      (3) Elliott, M. L. et al. Brain-age in midlife is associated with accelerated biological aging and cognitive decline in a longitudinal birth cohort. Molecular psychiatry 26, 3829–3838 (2021).

      (4) Smith, S. M. et al. An expanded set of genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature neuroscience 24, 737–745 (2021).

      (5) Smith, S. M. et al. Brain aging comprises many modes of structural and functional change with distinct genetic and biophysical associations. elife 9, e52677 (2020).

      (6) Tamnes, C. K. et al. Brain development and aging: overlapping and unique patterns of change. Neuroimage 68, 63–74 (2013).

      (7) Bethlehem, R. A. et al. Brain charts for the human lifespan. Nature 604, 525–533 (2022).

      (8) Desikan, R. S. et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968–980 (2006).

      (9) Singh, S. & Singh, T. G. Role of nuclear factor kappa B (NF-κB) signalling in neurodegenerative diseases: an mechanistic approach. Current Neuropharmacology 18, 918–935 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers for their insightful comments, which have helped to improve the manuscript. We provide specific examples and a point-by-point response to all comments, below. Based on the Reviewers’ comments, we revised our manuscript, adding considerable amount of new data (found in Fig. 1A,B, 4E-G, 7C,D, 8C,E, S1B,C, S2C-G, S4C, and Video 1). In the main manuscript text, blue fonts indicate added or revised texts. An additional author (Lauren N. Juga) is added for the newly generated data in the revised manuscript.

      Reviewer #1: 

      Sekulovski et al present an interesting and timely manuscript describing the temporal transition from epiblast to amnion. The manuscript builds on their previous work describing this process using stem cell models. 

      They suggest a multi-step process initiated by BMP induction of GATA3, followed by expression of TFAP2A, followed by ISL1/HAND1 in parallel with loss of pluripotency markers. This transition was reproduced through IF analysis of CS6/7 NHP embryo. 

      There are significant similarities in the expression of trophectoderm and the amnion. There are also ample manuscripts showing trophoblast induction following BMP stimulation of primed pluripotent stem cells. The authors should ensure that the amnion indeed is only amnion and not trophectoderm (or the amount of contribution to trophectoderm). As an extension, does the amnion character remain after the 48h BMP4 treatment, and is a trophectoderm-like state adopted as suggested by Ohgushi et al 2022?  

      Thank you for this insightful comment. As pointed out, Ohgushi et al. showed that, in their culture method, amnion is first induced, and extended culturing leads to the formation of trophectoderm-like cells (Ohgushi et al., 2022).

      Importantly, we would like to note that our culture system differs substantially from that of Ohgushi et al. in several respects. First our system uses a 3D culture method while Ohgushi et al. employ 2D hPSC monolayers. Second, the two systems are chemically quite distinct. In our Glass-3D+BMP protocol, cells are cultured in mTeSR media (which contains FGF2 and TGFb1) for two days, by which time they generate 3D pluripotent cysts. BMP is then added to the culture medium for 24 hours, followed by another 24 hours without BMP4. In stark contrast, Ohgushi et al. employ A83-01, an Activin/Nodal signaling inhibitor, and PD173074, an FGF signaling inhibitor (a protocol which they call AP). This treatment leads to spontaneous activation of BMP signaling, but it also clearly inhibits Activin/Nodal and FGF signaling pathways, which remain active in our system. As a result of these distinct chemical as well as geometrical culturing protocols, their system produces amnion and trophectoderm, while our system produces exclusively amnion.

      Further analysis of gene expression data provides additional data supporting our contention that our system produces amnion. Though the gene expression profiles of amnion and trophectoderm are quite similar, specific markers of trophectoderm have been identified including GCM1, PSG1, PSG4 and CGB (Blakeley et al., 2015; Meistermann et al., 2021; Ohgushi et al., 2022; Okae et al., 2018; Petropoulos et al., 2016; Yabe et al., 2016). Importantly, while all of these markers are abundantly expressed in the Ohgushi et al. system, bulk RNA sequencing analysis of our Glass-3D+BMP hPSC-amnion cells reveals that none of these markers are detectable. Indeed, SDC1, a marker that Ohgushi et al. claim distinguishes trophoblast from amnion actually decreases (more than 8-fold) as pluripotent cysts transition to amnion in Glass3D+BMP. Finally, Ohgushi et al. report that ISL1, a key marker of specified amnion population, is initially increased in their system, but is reduced to a basal level overtime. In contrast, in Glass3D+BMP hPSC-amnion, ISL1 expression continuously increases with time, and ISL1 protein expression is seen uniformly throughout the amnion cysts. This uniform expression is also seen in CS6/7 cynomolgus macaque amnion. Together, these results support out conclusion that the Glass-3D+BMP system leads to the formation of amniotic cells, and not trophectoderm cells.

      The functional data does not support a direct function of GATA3 prior to TFAP2A and the authors suggest compensatory mechanisms from other GATAs. If so, which GATAs are expressed in this system, with and without GATA3 targeting? Would it not be equally likely that the other early genes could be the key drivers of amnion initiation, such as ID2? 

      We appreciate this helpful comment. We agree that our data do not provide sufficient evidence for the role of GATA3 in early amniogenesis. We also agree that other early genes could be key drivers, and apologize for including our speculation that focuses only on GATA2. GATA2 was selected because, among the other GATAs, GATA2 and GATA3 are the only abundantly expressed GATA factors. This point suggesting a potentially redundant role of GATA2 is now removed from the manuscript (Line#355 of the original manuscript).

      The targeting of TFAP2A displays a very interesting phenotype which suggests that amnion and streak share an initial trajectory but where TFAP2A is necessary to adopt amnion fate. It would again be important to ensure that this alternative fate is indeed in streak and not misannotated alternative lineages, including trophoblast. 

      Is TBXT induced in this setting as well as in the wt situation during amnion induction? This should be displayed as in Figure 3D and would be nice to be complimented by NHP IF analysis.

      We will address these two closely related comments together.

      TFAP2A-KO cysts contain ISL1+ squamous cells as well as SOX2+ pluripotent cells, suggesting that, while the initial focal amniogenesis is seen, subsequent spreading event is not seen. Interestingly, our new data show that TFAP2A-KO cysts display cells with high TBXT expression (Fig. 8E, Line#373-374). This result suggests that, in the absence of TFAP2A, once amnion lineage progression is halted, more primitive streak-like (TBXThigh) lineage emerges. It is important to note that TBXT expression is not seen in the trophectoderm population of cynomolgus macaque peri-gastrula (Sasaki et al., 2016; Yang et al., 2021).

      As suggested, we now include a TBXT expression time course during hPSC-amnion formation in Fig. S2D of the revised manuscript. These data show weak TBXT expression (transcripts) starting at the 24-hr timepoint. However, a clear TBXT protein signal could not be detected using IF (Fig. S2C), likely because TBXT expression is very low (Line#264-265). While statistically significant compared to the 12-hr timepoint, TBXT expression is 31 FPKM +/- 0.8 (standard deviation) at 24-hr and 48 FPKM +/- 6 at 48-hr. These are low expression values compared to, for example, TFAP2A, which displays 572 FPKM +/- 23 at 12-hr and 1169 FPKM +/- 27 at 24-hr, at which TFAP2A is readily detected using IF. While weak nuclear TFAP2A is seen using IF at 6hr (187 FPKM +/- 7), no clear TFAP2A is detected at 3-hr (74 FPKM +/- 7). Another example is ISL1, which displays 758 FPKM +/- 55 at 24-hr and 1505 FPKM +/- 26 at 48-hr, when ISL can be detected using IF. Importantly, we were not able to detect ISL1 protein expression using IF at

      12-hr, at which its expression level is 12 FPKM +/-18. Lastly, we now show that, in the cynomolgus macaque peri-gastrula, while pSMAD1/5+ primitive streak-derived disseminating cells show abundant TBXT expression, no clear TBXT expression is seen in the amnion territory (Fig. S2G, Line#291-293). 

      Together, these results show that while a TBXTlow state clearly emerges during hPSC-amnion development, in wild-type hPSC cultured in Glass-3D+BMP, TBXT levels remain low throughout amnion differentiation. However, in the absence of TFAP2A, a TBXThigh state is seen, suggesting that TFAP2A is critical for suppressing this TBXThigh state in fate spreading cells, perhaps by preventing BMP responding cells from acquiring embryonic lineages (e.g., mesodermal and/or primordial germ cells).

      The authors should address why they get different results from Castillo-Venzor et al 2023 DOI: 10.26508/lsa.202201706  

      Thank you very much for this helpful suggestion, and we now include a section detailing this in the Discussion (Line#410-432). In short, we propose several possibilities. First, culturing conditions are highly distinct. Castillo-Venzor et al. (Castillo-Venzor et al., 2023) utilize initial “pre-mesoderm” conditioning by Activin and CHIR, followed by treating floating embryoid bodies with a growth factor cocktail (BMP, SCF, EGF and LIF). In contrast, our system (Glass-3D+BMP) employs BMP stimulation of pluripotent cysts. Thus, we suspect that, in the PGCLC differentiation condition, cells are conditioned to the pre-mesodermal lineage. Moreover, we propose that amnion fate spreading may not be present in the PGCLC system, perhaps due to differences in geometry (aggregates versus cysts), or due to differing lineage commitment programs. That is, while initial amniogenesis is seen in the PGCLC system, most cells may already be committed to the PGC-like or mesodermal lineages by the time amnion fate spreading can occur. Alternatively, because several cell types (PGC-like, mesodermal and amniotic) co-exist in the culture by Castillo-Venzor et al., PGC-like and/or mesodermal cells may compensate for the loss of TFAP2A.

      Reviewer #2: 

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated. 

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.  

      We are happy that Reviewer #2 agrees that our Glass-3D+BMP model is important for investigating additional roles of amniogenesis, as well as roles of amnion as a signaling hub, due to the purity of the amniotic cell population, and a high degree of synchrony of differentiation.

      We respectfully disagree that the absence of other embryonic tissues in the model is a weakness: rather, we believe it is a strength because this single lineage amnion model allows us to directly (and independently) investigate mechanisms underlying amnion lineage progression. For example, as noted above in our response to Reviewer #1, use of our hPSCamnion model allowed us to see a very specific and interesting phenotype in the absence of TFAP2A (reduced amnion formation and emergence of an alternative lineage), though previous findings by Castilllo-Venzor et al. concluded that amniogenesis is not affected by loss of TFAP2A. We noted that the culture method used by Castillo-Venzor et al. contains several cell types (amniotic, mesodermal and PGC-like), and that amniogenesis may be intact in that model due to compensation by the presence of these other cell types. That is, while cell-cell interactions can indeed be gleaned in culture systems with several cell types, the presence of multiple cell types and their additional signaling inputs can also confound some aspects of mechanistic investigations. We now include a paragraph in the Discussion of the revised manuscript (Line#410-432), in which we detail these ideas, and suggest that, because of the cell purity, our Glass-3D+BMP model enables robust mechanistic examinations, specifically during amnion formation.

      We address Reviewer #2’s point about bulk vs. single cell transcriptomic similarity analysis in Reviewer’s specific point #4 below. We do, however, want to note here that we have performed the same analysis using a 14-day old cynomolgus macaque peri-gastrula single cell RNA sequencing dataset generated by Yang et al. (Yang et al., 2021), and obtained a lineage trajectory (Fig. 4F, Line#265-268) similar to that seen when the Tyser et al. dataset (Tyser et al., 2021) was used (Fig. 4C).

      Importantly, while cynomolgus macaque early embryo samples are limited, we now include additional staining (Fig. S2G). 

      Reviewer #2 (Recommendations For The Authors): 

      Provide more confirmation of key findings in more than one stem cell line. 

      We now confirm key findings in the H7 human embryonic stem cell line (Fig. S1C).

      Provide stronger evidence e.g. scRNA-seq to support the existence of intermediate cells or tone down the conclusions.  

      We agree that this is a very important point. In our recent study (Sekulovski et al., 2023), we performed single cell RNA sequencing of Gel-3D, another hPSC-amnion model. In this study, we comprehensively described the transcriptome associated with the “intermediate” cell types, as well as CLDN10 as a marker of these cell types. Moreover, we now include additional data showing the molecular characteristics of the TBXTlow intermediate cells during amniogenesis in hPSC-amnion (Fig. S2C, S2D) and d14 cynomolgus macaque peri-gastrula (Fig 4G, replot of single cell RNAseq by (Yang et al., 2021), Line#264-268).

      Provide more data on the expression of DLX5 in the model. 

      We now provide a DLX5 staining time course in Fig. 7C. We find that, similar to ISL1, prominent DLX5 staining is seen in the focal cells at 24-hr post-BMP. Interestingly, at 48-hr, while some cells show high levels of DLX5, some cells show low DLX5 levels; this is of an interest for future investigations.

      (1) L159 - the authors should repeat more of the key results in at least one other hPSC line, to ensure reproducibility of the method. Figure S1 contains minimal information (one timepoint, three genes, one biological replicate) on a single different hPSC line. 

      We now include additional validation analysis using the H7 human ESC line (Fig. S1).

      (2) Figure 1- it is a little difficult to appreciate cyst formation from images taken at one level in the stack, can the authors perhaps show a 3D rendering or video to display morphogenesis better? 

      We now provide all optical sections of cysts shown in Movie 1.

      (3) Figure 1-did the authors carry out podocalyxin staining? This is a standard marker for lumenogenesis.  

      We now provide PODXL staining (Fig. 1A,1B).

      (4) L248 onwards and Figure 4-I am a little skeptical concerning conclusions drawn from an overlay of bulk RNA-seq onto scRNA-seq UMAP plots. I think the authors need to provide some strong justification for this approach. I would be particularly careful about concluding that cells depicted in Fig 4D represent an intermediate close to primitive streak and even more careful about claiming any lineage relationship between T-positive "primitive streak like intermediates" and the trajectory of cells in the model. UMAP is a dimension-reduction technique for the visualization of clusters in high-dimensional data. It is not a lineage-tracing methodology. It would have been preferable for the authors to present their own scRNA-seq data from the model.  

      We are sorry that it was not clear that our approach to find similarity between bulk and single cell RNA-seq data is largely based on a published work (Granja et al., Nature Biotechnology 2019, (Granja et al., 2019)) named projectLSI. Please refer to our Methods section for details of the implementation and how we modified it for better visualization (addressed in Line#667-676 of the original manuscript, now in Line#718-730). The performance of projectLSI was extensively evaluated in the original article. Furthermore, as pointed out, UMAP is indeed a dimension reduction method that has been widely used in single cell RNA-seq research. In addition to visualizing clusters, trajectory analysis, such as RNA-velocity (which is used in this study), is another successful and widely adapted application of UMAP to gauge fate progression. Therefore, we believe that UMAP can be effectively used as a lineage prediction methodology, and that our use of bulk to single cell transcriptomic similarity analysis leveraging projectLSI is well justified at conceptual and technical levels.

      As illustrated in Fig. 5A, we performed RNA-velocity analysis of the Tyser et al. dataset, and our result clearly predicts a differentiation trajectory from Epiblast, a part of the TBXTlow population shown in Fig. 4D, and, then, to Ectoderm/Amnion cells. Consistent with this bioinformatic result, we now show that some cells show some but weak TBXT expression (at the transcript level) at the 24-hr post-BMP timepoint in control hPSC-amnion (Fig. S2D, Line#264-265). Importantly, our conclusion is drawn from a trajectory based on our time course (0, 0.5, 1, 3, 6, 12, 24, and 48 hours post-BMP treatment) which shows a clear transition from epiblast cells to TBXTlow and then finally to the ectoderm/amnion population. Moreover, using the transcriptomic similarity analysis, we found that the loss of TFAP2A leads to emergence of more primitive streak-like transcriptional characteristics (Fig. 8D). Indeed, using IF, we now show that several fate spreading cells in the TFAP2A-KO cysts are TBXThigh (Fig. 8E, Line#373-374). Thus, the new data provide additional evidence for the successful implementation of this bulk/single cell transcriptomic similarity analysis.

      Together, our bioinformatic and localization analyses show that the Glass-3D+BMP system recapitulates the trajectory found in our Tyser et al. RNA-velocity analysis, further supporting the validity of this differentiation trajectory. To avoid confusion, however, we now omit the “primitive streak-like” phrase when describing the TBXTlow cells because, while they may show some TBXT expression, they are likely intermediate fate transitioning cells. Indeed, a recent study by Ton et al. (Ton et al., 2023) showed that the Tyser et al. Primitive Streak cells consist of a mix of several lineage progressing cells (e.g., Epiblast, Non-neural ectoderm, Anterior or caudal primitive streak, PGC). Therefore, these cells are now specifically described as “TBXTlow” state; TBXThigh cells are described as primitive streak-like state.

      (5) L276 Tyser data do come from a primate model; the authors mean NHP.  

      We now specifically state that the validation is performed in a non-human primate model (Line#280).

      (6) Figure 5-though the immunostaining of the CS6/7 monkey embryos is excellent, the authors should not overinterpret these images. What is shown is not a time course, and one can only infer that a particular pattern of gene expression exists in a spatial sense from these images. In the model (Figure 2), the epiblast markers gradually fade and overlap for a time with emergent amnion markers, but in Figure 5 the transition between epiblast and amnion in the embryo seems pretty sharp, at least in terms of gene expression. There may be a few cells in D that show overlap of SOX2 and TFAP2A, but if the authors want to claim that a transition zone exists, they need to produce stronger evidence. Figure 7 is more convincing but see the next point. 

      Thank you for this insightful comment. We now address the nature of the transitioning boundary cell population extensively in our other recent study (Sekulovski et al., 2023).

      (7) Figure 7 further confuses the issue. A zone at either end of the epiblast is clearly positive for Sox2 and the two amnion markers, clearer than in Figure 5, but why does the marker DLX5 overlap with SOX2 in the embryo (7d) but not the model (7C)? Arguments regarding intermediate cell populations would be greatly strengthened by scRNA-seq data on the model system. 

      In our original manuscript, our DLX5 staining was performed at 48-hr post-BMP, at which SOX2 expression is absent in all cells. Our new analysis at the 24-hr timepoint now shows that DLX5 is expressed in SOX2+ cells (this is now presented in Fig. 7C).

      As stated in the point #6, our recent study comprehensively describes the transcriptomic and spatial characteristics of the transitioning boundary cell population (Sekulovski et al., 2023).

      (8) L357 TFAP2C KO does not resemble intermediate cysts in Figure 2. In Figure 2, both SOX2 and amnion markers are co-expressed in the same cells. In 8C, SOX2 and ISL1 are mutually exclusive.  

      We agree with this comment, and now removed this statement pointing out the resemblance (Line#359 of the original manuscript).

      (9) Figure 8d-the same caveats noted above regarding the interpretation of superposition of bulk RNA-seq data with scRNA-seq UMAP analysis apply here.  

      Please refer to our explanation in point#4.

      Reviewer #3: 

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.  

      We are happy that Reviewer #3 appreciates that our model can be used effectively to identify previously unrecognized amniotic gene activation cascade, providing a comprehensive timecourse transcriptomic resource.

      As detailed below, we address specific concerns raised by Reviewer #3. We now provide additional mechanistic insights into amnion fate progression, and include additional transcriptomic comparisons with a cynomolgus macaque single cell RNA sequencing dataset.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors generated KO cell lines lacking GATA3 and TFAP2A, respectively. Their results showed some disrupted amnion differentiation only in TFAP2A-KO. Therefore, these data do not provide sufficient evidence to support whether these transcription factors are crucial for amnion fate specification. Perhaps an experiment could be done with overexpression of these markers and testing if they could force hPSC to adopt amnion-like fate.  

      Thank you for this insightful comment. We generated cell lines that enable us to inducibly express GATA3 or TFAP2A, and the transgene expression was induced at d2 (when BMP treatment is normally initiated) until d4. However, this inducible expression did not lead to amniogenesis, and cysts maintained pluripotency. Due to the uninterpretable nature, these results are not included in the revised manuscript.

      As detailed extensively in the manuscript, within each cyst, amniogenesis is initially seen focally, then spreads laterally resulting in fully squamous amnion cysts. This is also seen in our previously published Gel-3D amnion model (extensively described in (Shao et al., 2017)). In the absence of TFAP2A, we showed that the focal amniogenesis is observed, but spreading is not seen, suggesting that TFAP2A controls amnion fate progression. Therefore, while TFAP2A is not critical for the amnion fate specification in the focal cells, our results show that TFAP2A indeed helps to promote amniotic specification of cells neighboring the focal amniotic cells. Moreover, in the revised manuscript, we now show that TFAP2A transgene expression in the TFAP2A-KO background restores formation of fully squamous hPSC-amnion, further establishing the role of TFAP2A in amnion fate progression (Fig. 8C of the revised manuscript, Line#362-364).

      (2) The transcriptomic analysis made by the authors provides some comparison between BMPinduced amnion-like cells in vitro and the amnion-like cells from CS7 human embryo in vivo. However, the data set from the human embryo contains only a limited number of cells, and might not provide a sufficient base for decisive assessment of the true identity of amnion-like cells obtained in vitro. It might help if the authors could integrate their bulk sequencing data with other primate embryo data sets.  

      Thank you for this helpful comment. We have now performed our transcriptional similarity analysis using early (day 14) cynomolgus macaque embryo datasets generated in a study by (Yang et al., 2021), and found that the bulk time-course transcriptome of our hPSC-amnion model overlaps with the cynomolgus macaque amniotic lineage progression (Fig. 4F, Line#265268). We also now provide the expression of key markers within the Yang et al. dataset (GATA3, TFAP2A, ISL1, TBXT, DLX5, Fig. 4G, S2F).

      (3) Following the point above, the authors used transcriptomic analysis to identify several intermediate states of cells during amnion differentiation and claimed that there is a primitivestreak-like intermediate. However, this might be an overstatement. During stem cell culture and differentiation, intermediate states showing a mixture of biomarkers are very common and do not imply that such intermediates have any biological meaning. However, stating that amnion differentiation passes through primitive streak-like intermediates, might imply a certain connection between these two lineages, for which there is a lack of solid support. Instead, a more interesting question might be how amnion and primitive streak differentiation, despite some transcriptomic similarity, diverge from each other during early development. What factors make this difference? The authors might further analyze RNA-seq data to provide some insights.  

      Thank you very much for the insightful comments. 

      We understand Reviewer #3’s concern that the intermediate state that we see may not recapitulate a primitive streak-like state. However, in our original manuscript, we described these cells as “Primitive Streak-like” because those cells were annotated as Primitive Streak in the dataset by Tyser et al. Interestingly, a recent study by Ton et al. showed that the Tyser et al. Primitive Streak cells actually consist of a mixture of different cell lineages (e.g., Epiblast, Nonneural ectoderm, Anterior or caudal primitive streak, PGC (Ton et al., 2023)). Therefore, we agree that it was an overstatement to call them “Primitive Streak-like”, and, to avoid confusions, we now label the TBXTlow sub-population found in the Tyser et al. Primitive Streak population as “TBXTlow state” throughout the manuscript.

      Our data indicate that TFAP2A may play a role in controlling the lineage decision between amnion and primitive streak cells that abundantly express TBXT (TBXThigh). In the original manuscript, we included data showing that 48-hr TFAP2A-KO cysts show transcriptomic characteristics similar to some Primitive Streak cells (Fig. 8D). Intriguingly, our new data show that, in the absence of TFAP2A, some TBXThigh cells are indeed seen (Fig. 8E, Line#373-374). These results provide a body of evidence for the role of TFAP2A in promoting the amniotic lineage, perhaps by suppressing the TBXThigh state. This point is now addressed in the Discussion (Line#401-409).

      Additional new data:

      Using Western blot, we now show that GATA3 is absent in the GATA3-KO lines (Fig. S4C). We noticed that this was lacking in the original manuscript.

      We now show that an inducible expression of TFAP2A in the TFAP2A-KO cysts leads to controllike cysts (Fig. 8C, Line#362-364).

      Additional changes:

      Typos were fixed in Fig. 5I – “boundary” and “disseminating” were not spelled correctly.

      Line#350 – we originally noted “GATA3 expression precedes TFAP2A expression by approximately 12 hours”. This was incorrect, and is changed to 9 hours in the revised manuscript. We apologize for this mistake.

      REFERENCES

      Blakeley, P., Fogarty, N.M., del Valle, I., Wamaitha, S.E., Hu, T.X., Elder, K., Snell, P., Christie, L., Robson, P., and Niakan, K.K. (2015). Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151-3165.

      Castillo-Venzor, A., Penfold, C.A., Morgan, M.D., Tang, W.W., Kobayashi, T., Wong, F.C., Bergmann, S., Slatery, E., Boroviak, T.E., Marioni, J.C., et al. (2023). Origin and segregation of the human germline. Life Sci Alliance 6.

      Granja, J.M., Klemm, S., McGinnis, L.M., Kathiria, A.S., Mezger, A., Corces, M.R., Parks, B., Gars, E., Liedtke, M., Zheng, G.X.Y., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature biotechnology 37, 1458-1465. Meistermann, D., Bruneau, A., Loubersac, S., Reignier, A., Firmin, J., Francois-Campion, V., Kilens, S., Lelievre, Y., Lammers, J., Feyeux, M., et al. (2021). Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell stem cell 28, 1625-1640 e1626.

      Ohgushi, M., Taniyama, N., Vandenbon, A., and Eiraku, M. (2022). Delamination of trophoblastlike syncytia from the amniotic ectodermal analogue in human primed embryonic stem cellbased differentiation model. Cell reports 39, 110973.

      Okae, H., Toh, H., Sato, T., Hiura, H., Takahashi, S., Shirane, K., Kabayama, Y., Suyama, M., Sasaki, H., and Arima, T. (2018). Derivation of Human Trophoblast Stem Cells. Cell stem cell 22, 50-63 e56.

      Petropoulos, S., Edsgard, D., Reinius, B., Deng, Q., Panula, S.P., Codeluppi, S., Plaza Reyes, A., Linnarsson, S., Sandberg, R., and Lanner, F. (2016). Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 165, 1012-1026.

      Sasaki, K., Nakamura, T., Okamoto, I., Yabuta, Y., Iwatani, C., Tsuchiya, H., Seita, Y., Nakamura, S., Shiraki, N., Takakuwa, T., et al. (2016). The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion. Developmental cell 39, 169-185.

      Sekulovski, N., Juga, L.L., Cortez, C.L., Czerwinski, M., Whorton, A.E., Spence, J.R., Schmidt, J.K., Golos, T.G., Gumucio, D.L., Lin, C.-W., et al. (2023). Identification of amnion progenitor-like cells at the amnion-epiblast bounday in the primate peri-gastrula. bioRxiv doi:

      10.1101/2023.09.07.556553.

      Shao, Y., Taniguchi, K., Townshend, R.F., Miki, T., Gumucio, D.L., and Fu, J. (2017). A pluripotent stem cell-based model for post-implantation human amniotic sac development. Nature communications 8, 208.

      Ton, M.N., Keitley, D., Theeuwes, B., Guibentif, C., Ahnfelt-Ronne, J., Andreassen, T.K., Calero-Nieto, F.J., Imaz-Rosshandler, I., Pijuan-Sala, B., Nichols, J., et al. (2023). An atlas of rabbit development as a model for single-cell comparative genomics. Nature cell biology 25, 10611072.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S., Vallier, L., Scialdone, A., and Srinivas, S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285289.

      Yabe, S., Alexenko, A.P., Amita, M., Yang, Y., Schust, D.J., Sadovsky, Y., Ezashi, T., and Roberts, R.M. (2016). Comparison of syncytiotrophoblast generated from human embryonic stem cells and from term placentas. Proceedings of the National Academy of Sciences of the United States of America 113, E2598-2607.

      Yang, R., Goedel, A., Kang, Y., Si, C., Chu, C., Zheng, Y., Chen, Z., Gruber, P.J., Xiao, Y., Zhou, C., et al. (2021). Amnion signals are essential for mesoderm formation in primates. Nature communications 12, 5126.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301. 

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora

      Thus, we discussed this point as: “In particular, the numerically dominant Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with A. adenophora (Chen et al., 2022; Kai Fang et al., 2021; Yang et al., 2023). Interestingly, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). We did not isolate fungi from healthy seedlings to determine whether the live seedlings indeed lacked or accumulated a lower abundance of the seedling-killing strains than did the dead seedlings in this study. We could assume that these fungal genera likely exist in A. adenophora mature individual experiencing a lifestyle switch from endophytic to pathogenic and play an essential role in limiting the population density of A. adenophora monocultures by killing seedlings only at very early stages. Thus, it is worth exploring the dynamic abundance of these strains and host resistance variation during A. adenophora seedling development.” in Line 432-

      444. 

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as:

      (variablenonsterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, only leaf microbes affect seed germination time (GT), leaf and soil microbes did not affect germination rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of the difference between nonsterile and sterile treatments (also see Figure 1d) to assess microbial effect, and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 3). We emphasized this point in the Materials and Methods in our resubmitted revision as: “Because the mortality rates of some sterile groups were zero and their RIs were impossible to calculate, we had to directly compare the seedling mortality caused by nonsterile with by sterile samples and perform the analysis of correlation between the mortality rate and microbial composition.” in Line 565-568. 

      (4) The language of the manuscript could be improved to increase clarity.

      We have improved language in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary: 

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics. 

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is wellstructured, providing clear and comprehensive information.

      Thank you very much for your assessment.

      Reviewing Editor (Recommendations For The Authors):

      In addition to the assessments from the reviewers, we have the following comments on your paper:

      (1) The experimental design is complicated with regard to the multiple interacting treatments. The statistical analyses show that the interaction terms are important and significant. In this case, it could be more informative to show the detailed results at the sub-level than at the main level in the main text. For example, the main effects of inoculation sources and nutrients shown in Figure 2 are difficult to interpret, because the effects of inoculation sources and nutrients have important dependencies with each other and other factors such as inoculation time as shown in Figure S3. Therefore, Figure S3 is more informative than Figure 2. Please also be cautious that it would be necessary to clarify this context dependence when showing and citing results of the main effect to avoid any possible misunderstanding, such as the case of Figure 2 and S3.

      Thanks for your suggestion. We have deleted Figure 2 and placed Figure S3 in the text as Figure 2. And corresponding results have rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (2) Response index (RI) is already a measure of microbial feedback effect, so that feedback may not be necessary as an explanatory variable in the model with RI as the response variable.

      We are sorry that our writing misunderstood you. Here the word “feedback” (e.g., foliage- or soil feedback) does not represent microbial feedback effect, it means leaf or soil inoculation. We have replaced “feedback” by “inoculation source” in the figures and text for better understanding.

      (3) Mortality rate is a ratio. It is unclear whether assuming a Gaussian error distribution is fine in your case. It would be important to check the residual distribution and to see whether data transformation (e.g., log) or using other error assumptions (e.g., binomial) is necessary.

      Thanks for your suggestion. As you say, it is not appropriate to use generalized linear models (GLMs) with Gaussian error distributions (identity link) to evaluate seedling mortality, because mortality rate is a ratio, which do not meet normality. Thus, we deleted the result of GLM of seedling mortality and directly compared seedling mortality between different microbial treatments, inoculation time, nutrition level and inoculation source by Mann–Whitney U test and Kruskal–Wallis test (see Fig.1 d). All corresponding results have also been rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (4) Please be consistent about the wording of different treatment names throughout the texts, tables, and figures. For example, "feedback" should only be used for microbial treatment, but not for inoculation source treatment (e.g., Figure 2). We can say there is an effect of microbial feedback only if we compare sterile vs. non-sterile groups, otherwise, there could be other effects, for example, the allelopathic effect pointed out by Reviewer #1. When writing inoculation, please be specific about whether it is for inoculation time or inoculation source (e.g., within multiple statistical tables in the appendix).

      Thanks for your good suggestion. We have changed “different feedback” into “different inoculation source” for better understanding our story.

      (5) Please clarify which inoculation periods they are for Figures 1d-g.

      Thanks for your good suggestion. We have added inoculation periods in Fig.1.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      Lines 12-15: This sentence is too long and complicated, making it unclear what had been done and what had not in previous studies.

      Thanks a lot. We have reorganized this sentence as: “However, how the phyllosphere and rhizosphere soil microbes distinctively affect seedling mortality and the growth of invasive plants across ontogeny under varying soil nutrient levels remains unclear.”.

      Line 19: is it appropriate to use "enrich" here?

      Thanks. We have changed “Microbial inoculation at different growth stages altered the microbial community and functions enriched in seedlings” into “Microbial inoculation at different growth stages altered the microbial community and functions of seedlings”.

      Line 24-25: "litter exhibited phylogenetic signals"? not clear what this means.

      Thanks. Significant phylogenetic signals represent the seedling-killing effects of fungal strains on A. adenophora were related to phylogenetic relatedness of these strains. So, we have changed “fungal strains isolated from dead seedlings inoculated with litter exhibited significant phylogenetic signals to seedling mortality” into “the A. adenophora seedling-killing effects of fungal strains isolated from dead seedlings by non-sterile leaf inoculation exhibited significant phylogenetic signals, by which strains of Allophoma and Alternaria generally caused high seedling mortality.”

      Line 29: using "in turn" in the first sentence seems weird.

      We deleted this.

      Lines 32-33: PSFs are usually positive because of?

      We have changed “PSFs have positive effects by escaping soil pathogens and recruiting some beneficial microbes” into “PSFs are usually positive because of escaping soil pathogens and recruiting some beneficial microbes”.

      Line 54: why emphasize "a single soil microbe"?

      Although the research of Geisen et al., (2021) assessed the effect of each strain of 34 isolates on seed germination and plant growth, Jevon et al., (2020) focused on the soil microbial community on seedling and adult plants survival. Thus, we changed “a single soil microbe” into “soil microbes”.

      Lines 85-86: "tested their mortality to seedlings"? not clear what this means.

      We are so sorry that our writing misunderstood you. We have changed “we also isolated the fungi associated with the dead seedlings and tested their mortality to seedlings.” into “we also isolated the fungi associated with the dead seedlings and tested their seedling-killing effects on A. adenophora.”.

      Results: no statistics and no references for the statistical tables that could support the results were presented in this section.

      We have deleted the inappropriate generalized linear models (GLMs) with Gaussian error distributions (identity link) for evaluating seedling mortality, and all corresponding results have also described (see Line 109-115 and Fig. 1d).

      Lines 100-102: this subtitle reads more like a summary of the following results than a title. All subtitles in the Result section have similar issues (i.e. Lines 148-150, 207-209).

      Thanks, we subdivided our Results into four sections and we changed these subtitles as:” Effects of leaf litter and rhizosphere soil on the mortality and growth of A. adenophora seedlings”, “Correlations of microbial community composition and potential function with seedling mortality at the early stage”, “Enrichment of microbial community and function by A. adenophora seedlings under different treatments”, and “Correlations of the enriched microbial community and function with A. adenophora seedling growth”.  

      Lines 148-206: since there are a lot of results concerning the microbial composition, I suggest focusing on those that could directly explain the positive or negative feedback. The one concerning diversity (e.g. Figure 3 and corresponding texts) does not seem necessary.

      Thanks for your suggestion. We have moved figure 3 into the supplementary figures as Figure S2. To focus on core microbes that could directly explain the positive or negative feedback, we reordered Figure 3, where firstly showed the core soil and leaf bacteria, bacterial functions, as well as core soil and leaf fungi, fungal function (Fig3 a-h); and then showed the correlations of top 30 bacterial and fungal genera from soil and leaf with seedling mortality rate (Fig3 i-j). 

      Line 180: is it not common sense that ectomycorrhiza can only be found in soil?

      Yeah, it is. We have deleted this sentence.

      Line 199: "the seedling mortality of these strains"? not clear what this means,

      We have changed “The seedling mortality of these strains” into “The seedling-killing of these strains on A. adenophora”.

      Line 291-292: I don't see how the authors can distinguish between allelopathic and pathogenic effects based on their results.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301.

      Lines 383-414: Correlations are not necessarily causations. Sometimes a strong correlation may result from higher-order interaction. The authors should be more cautious about the discussion of microbial function in this section.

      Thanks. We deleted all descriptions of adverse effect or beneficial effect on host plant A. adenophora growth and cautiously used “negative correlation or positive correlation” to discuss the functions of these enriched microbes by A. adenophora. In the last, we also added a sentence to say: “It is necessary to isolate these enriched microbes to test the interactions with the early life stage of A. adeonophora.”

      (see Line 411-413).

      Lines 489-490: I don't really understand why the authors performed a combination treatment. What did they expect from such a combination?

      Thanks. We described our consideration as: “Leaf inoculation at G28 was performed to simulate natural microbial spread from the leaf litter to the above part of the seedlings by suspending the leaf bag over the transplanted seedlings without direct contact all the time (see Zaret et al. (2021)). This method may result in only microbial species with easy air transmission to infect seedlings. Thus, an additional combination inoculation (named G21+28) was performed on both the 21st (with seedling contact) and 28th days (without seedling contact) to ensure that most leaf microbes had the opportunity to reach the seedlings.” see Line 498-505.

      Figure 1: why not use "mortality rate" instead of "death rate"?

      Thanks. We have changed “death rate” into “mortality rate” in all corresponding figures and text.

      Figure 8: This is a very complicated experimental setup. Why did the authors harvest the plants treated with nutrient addition after the 12th day of the experiment and harvest those without nutrient addition after the 16th day? Why the time lag?

      Thanks. We explained this as: “Seedlings were harvested after 8 weeks of growth under high-nutrient conditions because they grew too fast and touched the PTFE cover; however, we harvested those plants grown under low-nutritional conditions after another 4 weeks of growth due to their very small size (see Fig. S6).”

      (see Method in Line 514-517).

    1. Author response:

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      We thank the reviewer for insightful comments and helpful suggestions. We will address majority of the concerns. Specifically, we will evaluate whether loss of Atg14 leads pyroptosis in other reproductive tract tissue, uterus, and ovary. To determine the ATG14 spatiotemporal expression, we will assess the ATG14 expression in oviducts of WT, and cKO mouse models. Further, to understand the impact of Atg14 loss on different regions of oviduct, we would provide additional images from cKO mice and will quantify FOXJ1 positive cells. To address the concerns on cyclicity and steroid hormone levels, we will measure the E2 or P4 levels and assess E2-target genes in uterus from control and cKO mice. We will also include the ampullary section images from the oviducts of Atg14 cKO and control females.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We agree with the reviewer on this, we will take a cautious approach and will modify the statements that ATG14 dependent autophagy might be critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      To address Atg14 action in oviduct, we will perform ATG14 IHC staining in the oviduct and also evaluate the GSDMD expression in uteri and ovary, wherein PR-cre expression is active. Further, we will provide literature-based evidence for PR-cre expression in the oviduct, which is well-established. However, generating a secretory Pax-8 cell cre mice model will require a substantial amount of time and effort and we respectfully argue that this is currently out of the scope of this manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We will perform expression analysis for ATG14 in Foxj1/Atg14 cKO mice to determine the effective ablation in cilia.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      We will correct the labels and include high-magnification images to explain the morphological differences in HESC cells..

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief Pathologist in our department and a contributing author, critically evaluated the stained sections from Figure 3 and provided the pathological assessment as outlined in lines 332-336. We will consult Dr. Masand and will modify the statements accordingly.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We will calculate the average number of embryos from the oviduct and uterus and provide numerical data.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We will perform the GSDMD staining to determine whether Polyphyllin VI treatment resulted in oviductal pyroptosis activation and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We will include more literature-based discussion on pyroptosis and its mechanism.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We will provide definitions for the acronyms such as FRT, HESCs, and GSDMD.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We will correct the statement.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      As suggested, we will include the scale bars with high quality images and will elaborate the figure legends text.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We will make these necessary changes.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We will carry out GSDMD staining in uterine tissues and discuss the findings.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We will assess the expression of autophagy-related markers in response to Atg14 loss and will discuss the findings. 

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We agree with reviewer on cellular plasticity aspect, we will remove the plasticity word, instead will mention only integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We will perform additional mitochondrial staining to determine the mitochondrial morphology in both the oviduct and uterus. Based on the results, we would consider measuring the Feret's diameters. However, we respectfully argue that performing complex membrane potential studies will take time and are beyond the scope of current focus.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      As shown in Figure 3E, staining the oviduct epithelia with KRT8 clearly showed a disorganized oviduct with abnormally fused cells leaving no lumen space.  We could provide higher magnification images in supplementary figures to highlight this observation.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      Autophagy has been shown to inhibit pyroptosis by either inhibiting the cleavage of GSDMD or by suppressing various pyroptosis-related factors, including NFLRs and STING proteins. We found that the loss of Atg14 results in elevated GSDMD levels, a potential mechanism through which Atg14 suppresses pyroptosis in the oviduct. Importantly, Atg14 may regulate GSDMD through several intermediary factors, and resolving this intricate nexus necessitates conducting complex biochemical, cellular, and molecular screenings, which is one of the focus of our future investigations.

  3. May 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses

      (1) The authors face a technical challenge (which they acknowledge): they use two numbers (mean and variance) to characterize synaptic variability, whereas in the brain there are three numbers (number of vesicles, release probability, and quantal size). Turning biological constraints into constraints on the variance, as is done in the paper, seems somewhat arbitrary. This by no means invalidates the results, but it means that future experimental tests of their model will be somewhat nuanced.

      Agreed. There are two points to make here.

      First, the mean and variance are far more experimentally accessible than n, p and q. The EPSP mean and variance is measured directly in paired-patch experiments, whereas getting n, p and q either requires far more extensive experimentation, or making strong assumptions. For instance, the data from Ko et al. (2013) gives the EPSP mean and variance, but not (directly) n, p and q. Thus, in some ways, predictions about means and variances are easier to test than predictions about n, p and q.

      That said, we agree that in the absence of an extensive empirical accounting of the energetic costs at the synapse, there is inevitably some arbitrariness as we derive our energetic costs. That was why we considered four potential functional forms for the connection between the variance and energetic cost, which covered a wide range of sensible forms for this energetic cost. Our results were robust to this wide range functional forms, indicating that the patterns we describe are not specifically due to the particular functional form, but arise in many settings where there is an energetic cost for reliable synaptic transmission.

      (2) The prediction that the learning rate should increase with variability relies on an optimization scheme in which the learning rate is scaled by the inverse of the magnitude of the gradients (Eq. 7). This seems like an extra assumption; the energy efficiency framework by itself does not predict that the learning rate should increase with variability. Further work will be needed to disentangle the assumption about the optimization scheme from the energy efficiency framework.

      Agreed. The assumption that learning rates scale with synapse importance is separate. However, it is highly plausible as almost all modern state-of-the-art deep learning training runs use such an optimization scheme, as in practice it learns far faster than other older schemes. We have added a sentence to the main text (line 221), indicating that this is ultimately an assumption.

      Major

      (1) The correspondence between the entropy term in the variational inference description and the reliability cost in the energetic description is a bit loose. Indeed, the entropy term scales as −log(σ) while reliability cost scales as σ−ρ. While the authors do make the point that σ−ρ upper bounds −log(σ) (up to some constant), those two cost terms are different. This raises two important questions:

      a. Is this difference important, i.e. are there scenarios for which the two frameworks would have different predictions due to their different cost functions?

      b. Alternatively, is there a way to make the two frameworks identical (e.g. by choosing a proposal distribution Q(w) different from a Gaussian distribution (and tuneable by a free parameter that could be related to ρ) and therefore giving rise to an entropy term consistent with the reliability cost of the energy efficiency framework)?

      To answer b first, there is no natural way to make the two frameworks identical (unless we assume the reliability cost is proportional to log_σsyn_, and we don’t think there’s a biophysical mechanism that would give rise to such a cost). Now, to answer a, in Fig. 7 we extensively assessed the differences between the energy efficient σsyn and the Bayesian σpost. In Fig.7bc, we find that σsyn and σpost are positively correlated in all models. This positive correlation indicates that the qualitative predictions made by the two frameworks (Bayesian inference and energy efficiency) are likely to be very similar. Importantly though, there are systematic differences highlighted by Fig. 7ab. Specifically, the energy efficient σsyn tends to vary less than the Bayesian σpost. This appears in Fig. 7b which shows the relationship between σsyn (on the y-axis) and σpost (on the x-axis). Specifically, this plot has a slope that is smaller than one for all our models of the biophysical cost. Further, the pattern also appears in the covariance ellipses in Fig. 7a, in that the Bayesian covariance ellipses tend to be long and thin, while the energy efficient covariance ellipsis are rounder. Critically though both covariance ellipses show the same pattern in that there is more noise along less important directions (as measured by the Hessian).

      We have added a sentence (line 273) noting that the search for a theoretical link is motivated by our observations in Fig. 7 of a strong, but not perfect link between the pattern of variability predicted by Bayesian and energy-efficient synapses.

      (2) Even though I appreciate the effort of the authors to look for experimental evidence, I still find that the experimental support (displayed in Fig. 6) is moderate for three reasons.

      a. First, the experimental and simulation results are not displayed in a consistent way. Indeed, Fig 6a displays the relative weight change |Dw|/w as a function of the normalised variability σ_2/|_µ| in experiments whereas the simulation results in Fig 5c display the variance σ_2 as a function of the learning rate. Also, Fig 6b displays the normalised variability _σ_2/|_µ| as a function of the input rate whereas Fig 5b displays the variance _σ_2 as a function of the input rate. As a consequence the comparison between experimental and simulation results is difficult.

      b. Secondly, the actual power-law exponents in the experiments (see Fig 6a resp. 6b) should be compared to the power-law exponents obtained in simulation (see Fig 5c resp. Fig 5b). The difficulty relies here on the fact that the power-law exponents obtained in the simulations directly depend on the (free) parameter ρ. So far the authors precisely avoided committing to a specific ρ, but rather argued that different biophysical mechanisms lead to different reliability exponents ρ. Therefore, since there are many possible exponents ρ (and consequently many possible power-law exponents in simulation results in Fig 5), it is likely that one of them will match the experimental data. For the argument to be stronger, one would need to argue which synaptic mechanism is dominating and therefore come up with a single prediction that can be falsified experimentally (see also point 4 below).

      c, Finally, the experimental data presented in Fig6 are still “clouds of points". A coefficient of r \= 0_.52 (in Fig 6a) is moderate evidence while the coefficient of _r \= −0_._26 (in Fig 6b) is weak evidence.

      The key thing to remember is that our paper is not about whether synapses are “really" Bayesian or energy efficient (or both/neither). Instead, the key point of our paper, as expressed in the title, is to show that the experimental predictions of Bayesian synapses are very similar to the predictions from energy efficient synapses. And therefore energy efficient synapses are very difficult to distinguish experimentally from Bayesian synapses. In that context, the two plots in Fig. 6 are not really intended to present evidence in favour of the energy efficiency / Bayesian synapses. In fact, Fig. 6 isn’t meant to constitute a contribution of the paper at all, instead, Fig. 6 serves merely as illustrations of the kinds of experimental result that have (Aitchison et al. 2021) or might (Schug et al. 2021) be used to support Bayesian synapses. As such, Fig. 6 serves merely as a jumping-off point for discussing how very similar results might equally arise out of Bayesian and energy-efficiency viewpoints.

      We have modified our description of Fig. 6 to further re-emphasise that the panels in Fig. 6 is not our contribution, but is taken directly from Schug et al. 2021 and Aitchison et al. 2021 (we have also modified Fig 6 to be precisely what was plotted in Schug et al. 2021, again to re-emphasise this point). Further, we have modified the presentation to emphasise that these plots serve merely as jumping off points to discuss the kinds of predictions that we might consider for Bayesian and energy efficient synapses.

      This is important, because we would argue that the “strength of support" should be assessed for our key claim, made in the title, that “Signatures of Bayesian inference emerge from energy efficient synapses".

      a) To emphasise that these are previously published results, we have chosen axes to matchthose used in the original work (Aitchison et al. 2021) and (Schug et al. 2021).

      b) We agree that a close match between power-law exponents would constitute strong evidencefor energy-efficiency / Bayesian inference, and might even allow us to distinguish them. We did consider such a comparison, but found it was difficult for two reasons. First, while the confidence intervals on the slopes exclude zero, they are pretty broad. Secondly, while the slopes in a one-layer network are consistent and match theory (Appendix 5) the slopes in deeper networks are far more inconsistent. This is likely to be due to a number of factors such as details of the optimization algorithm and initialization. Critically, if details of the optimization algorithm matter in simulation, they may also matter in the brain. Therefore, it is not clear to us that a comparison of the actual slopes is can be relied upon.

      To reiterate, the point of our article is not to make judgements about the strength ofevidence in previously published work, but to argue that Bayesian and energy efficient synapses are difficult to distinguish experimentally as they produce similar predictions. That said, it is very difficult to make blanket statements about the strength of evidence for an effect based merely on a correlation coefficient. It is perfectly possible to have moderate correlation coefficients along with very strong evidence of an effect (and e.g. very strong p-values), e.g. if there is a lot of data. Likewise, it is possible to have a very large correlation coefficient along with weak evidence of an effect (e.g. if we only have three or four datapoints, which happen to lie in a straight line). A small correlation coefficient is much more closely related to the effect-size. Specifically, the effect-size, relative to the “noise", which usually arises from unmeasured factors of variation. Here, we know there are many, many unmeasured factors of variation, so even in the case that synapses are really Bayesian / energy-efficient, the best we can hope for is low correlation coefficients

      As mentioned in the public review, a weakness in the paper is the derivation of the constraints on σi given the biophysical costs, for two reasons.

      a.First, it seemed a bit arbitrary whether you hold n fixed or p fixed.

      b.Second, at central synapses, n is usually small – possibly even usually 1: REF(Synaptic vesicles transiently dock to refill release sites, Nature Neuroscience 23:1329-1338, 2020); REF(The ubiquitous nature of multivesicular release Trends Neurosci. 38:428-438, 2015). Fixing n would radically change your cost function. Possibly you can get around this because when two neurons are connected there are multiple contacts (and so, effectively, reasonably large n). It seems like this is worth discussing.

      a) Ultimately, we believe that the “real” biological cost function is very complex, and most likely cannot be written down in a simple functional form. Further, we certainly do not have the experimental evidence now, and are unlikely to have experimental evidence for a considerable period into the future to pin down this cost function precisely. In that context, we are forced to resort to two strategies. First, using simplifying assumptions to derive a functional form for the cost (such as holding n or p fixed). Second, considering a wide range of functional forms for the cost, and ensuring our argument works for all of them.

      b) We appreciate the suggestion that the number of connections could be used as a surrogate where synapses have only a single release site. As you suggest we can propose an alternative model for this case where n represents the number of connections between neurons. We have added this alternative interpretation to our introduction of the quantal model under title “Biophysical costs". For a fixed PSP mean we could either have many connections with small vesicles or less connections with larger vesicles. Similarly for the actin cost we would certainly require more actin if the number of connections were increased.

      Minor

      (1) A few additional references could further strengthen some claims of the paper:

      Davis, Graeme W., and Martin Muller. “Homeostatic Control of Presynaptic Neurotransmitter Release." Annual Review of Physiology 77, no. 1 (February 10, 2015): 251-70. https://doi.org/10.1146/annurev-physiol-021014-071740. This paper provides elegant experimental support for the claim (in line 538 now 583) that µ is kept constant and q acts as a compensatory variable.

      Jegminat, Jannes, Simone Carlo Surace, and Jean-Pascal Pfister. “Learning as Filtering: Implications for Spike-Based Plasticity." Edited by Blake A Richards. PLOS Computational Biology 18, no. 2 (February 23, 2022): e1009721. https://doi.org/10.1371/journal.pcbi.1009721.

      This paper also showed that a lower uncertainty implies a lower learning rate (see e.g. in line 232), but in the context of spiking neurons.

      Figure 1 of the the first suggested paper indeed shows that quantal size is a candidate for homeostatic scaling (fixing µ). This review also references lots of further evidence of quantal scaling and evidence for both presynaptic and postsynaptic scaling of q leaving space for speculation on whether vesicle radius or postsynaptic receptor number is the source of a compensatory q. On line 583 we have added a few lines pointing to the suggested review paper.

      The second reference demonstrates Bayesian plasticity in the context of STDP, proposing learning rates tuned to the covariance in spike timing. We have added this as extra support for assuming an optimisation scheme that tunes learning rates to synapse importance and synapse variability (line 232).

      In the numerical simulations, the reliability cost is implemented with a single power-law expression (reliability cost ). However, in principle, all the reliability costs will play in conjunction, i.e. reliability cost . While I do recognise that it may be difficult to estimate the biophysical values of the various ci, it might be still relevant to comment on this.

      Agreed. Limitations in the literature meant that we could only form a cursory review of the relative scale of each cost using estimates by Atwell, (2001), Engl, (2015). On line 135 we have added a paragraph explaining the rationale for considering each cost independently.

      (3) In Eq. 8: σ_2 doesn’t depend on variability in _q, which would add another term; barring algebra mistakes, it’s . It seems worth mentioning why you didn’t include it. Can you argue that it’s a small effect?

      Agreed. Ultimately, we dropped this term because we expected it to be small relative to variability in vesicle release, and because it would be difficult to quantify In practice, the variability is believed to be contributed mostly by variability in vesicle release. The primary evidence for this is histograms of EPSP amplitudes which show classic multi-peak structure, corresponding to one, two three etc. EPSPs. Examples of these plots include:

      - “The end-plate potential in mammalian muscle”, Boyd and Martin (1956); Fig. 8.

      - “Structure and function of a neocortical synapse”, Holler-Rickauer et al. (2019); Extended Figure 5.

      (3) On pg. 7 now pg. 8, when the Hessian is introduced, why not say what it is? Or at least the diagonal elements, for which you just sum up the squared activity. That will make it much less mysterious. Or are we relying too much on the linear model given in App 2? If so, you should tell us how the Hessian was calculated in general. Probably in an appendix.

      With the intention of maintaining the interest of a wide audience we made the decision to avoid a mathematical definition of the Hessian, opting instead for a written definition i.e. line 192 - “Hii; the second derivatives of the objective with respect to wi.” and later on a schematic (Fig. 4) for how the second derivative can be understood as a measure of curvature and synapse importance. Nonetheless, this review point has made us aware that the estimated Hessian values plotted in Fig. 5a have been insufficiently explained so we have added a reference on line 197 to the appendix section where we show how we estimated the diagonal values of the Hessian.

      (4) Fig. 5: assuming we understand things correctly, Hessian ∝ |x|2. Why also plot σ_2 versus |_x|? Or are we getting the Hessian wrong?

      The Hessian is proportional to . If you assume that time steps are small and neurons spike, then , and . it is difficult to say what timestep is relevant in practice.

      (5) To get Fig. 6a, did you start with Fig. Appendix 1-figure 4 from Schug et al, and then use , drop the q, and put 1 − p on the x-axis? Either way, you should provide details about where this came from. It could be in Methods.

      We have modified Fig. 6 to use the same axes as in the original papers.

      (6) Lines 190-3: “The relationship between input firing rate and synaptic variability was first observed by Aitchison et al. (2021) using data from Ko et al. (2013) (Fig. 6a). The relationship between learning rate and synaptic variability was first observed by Schug et al. (2021), using data from Sjostrom et al. (2003) as processed by Costa et al. (2017) (Fig. 6b)." We believer 6a and 6b should be interchanged in that sentence.

      Thank you. We have switched the text appropriately.

      (7) What is posterior variance? This seems kind of important.

      This refers to the “posterior variance" obtained using a Bayesian interpretation of the problem of obtaining good synaptic weights (Aitchison et al. 2021). In our particular setting, we estimate posterior variances by setting up the problem as variational inference: see Appendix 4 and 5, which is now referred to in line 390.

      (8) Lines 244-5: “we derived the relationships between the optimized noise, σi and the posterior variable, σpost as a function of ρ (Fig. 7b;) and as a function of c (Fig. 7c)." You should tell the reader where you derived this. Which is Eq. 68c now 54c. Except you didn’t actually derive it; you just wrote it down. And since we don’t know what posterior variance is, we couldn’t figure it out.

      If H is the Hessian of the log-likelihood, and if the prior is negligable relative to the the likelihood, then we get Eq. 69c. We have added a note on this point to the text.

      (9) We believe Fig. 7a shows an example pair of synapses. Is this typical? And what about Figs. 7b and c. Also an example pair? Or averages? It would be helpful to make all this clear to the reader.

      Fig. 7a shows an illustrative pair of synapses, chosen to best display the relative patterns of variability under energy efficient and Bayesian synapses. We have noted this point in the legend for Fig. 7. Fig. 7bc show analytic relationships between energy efficient and Bayesian synapses, so each line shows a whole continuum of synapses(we have deleted the misleading points at the ends of the lines in Fig. 7bc).

      (10)  The y-axis of Fig 6a refers to the synaptic weight as w while the x-axis refers to the mean synaptic weight as mu. Shouldn’t it be harmonised? It would be particularly nice if both were divided by µ, because then the link to Fig. 5c would be more clear.

      We have changed the y-axis label of Fig. 6a from w to µ. Regarding the normalised variance, we did try this but our Gaussian posteriors allowed the mean to become small in our simulations, giving a very high normalised variance. To remedy this we would likely need to assume a log- posterior, but this was out of scope for the present work.

      (11) Line 250 (now line 281): “Finally, in the Appendix". Please tell us which Appendix. Also, why not point out here that the bound is tightest at small ρ?

      We have added the reference to the the section of the appendix with the derivation of the biological cost as a bound on the ELBO. We have also referenced the equation that gives the limit of the biological cost as ρ tends to zero.

      (12) When symbols appear that previously appeared more than about two paragraphs ago, please tell us where they came from. For instance, we spent a lot of time hunting for ηi. And below we’ll complain about undefined symbols. Which might mean we just missed them; if you told us where they were, that problem would be eliminated.

      We have added extra references for the symbols in the text following Eq. 69.

      (13) Line 564, typo (we think): should be σ−2.

      Good spot. This has been fixed.

      (14)  A bit out of order, but we don’t think you ever say explicitly that r is the radius of a vesicle. You do indicate it in Fig. 1, but you should say it in the main text as well.

      We have added a note on this to the legend in Fig. 1.

      (15) Eq. 14: presumably there’s a cost only if the vesicle is outside the synapse? Probably worth saying, since it’s not clear from the mechanism.

      Looking at Pulido and Ryan (2021) carefully, it is clear that they are referring to a cost for vesicles inside the presynaptic side of the synapse. (Importantly, vesciles don’t really exist outside the synapse; during the release process, the vesicle membrane becomes part of the cell membrane, and the contents of the vesicle is ejected into the synaptic cleft).

      (16) App. 2: why solve for mu, and why compute the trace of the Hessian? Not that it hurts, but things are sort of complicated, and the fewer side points the better.

      Agreed, we have removed the solution for μ, and the trace, and generally rewritten Appendix 2 to clarify definitions, the Hessian etc.

      (17) Eq. 35: we believe you need a minus sign on one side of the equation. And we don’t believe you defined p(d|w). Also, are you assuming g = partial log p(d|w)/partial w? This should be stated, along with its implications. And presumably, it’s not really true; people just postulate that p(d|w) ∝ exp(−log_loss_)?

      We have replaced p(d|w) with p(y, x|w), and we replaced “overall cost” with log P(y|w, x). Yes, we are also postulating that p(y|w, x) ∝ exp(−log loss), though in our case that does make sense as it corresonds to a squared loss.

      As regards the minus sign, in the orignal manuscript, we had the second derivative of the cost. There is no minus sign for the cost, as the Hessian of the cost at the mode is positive semi-definite. However, once we write the expression in terms of a log-likelihood, we do need a minus sign (as the Hessian of the log-likelihood at a mode is negative semi-definite).

      (18) Eq. 47 now Eq. 44: first mention of CBi;i?

      We have added a note describing CB around these equations.

      (19) The “where" doesn’t make sense for Eqs. 49 and 50; those are new definitions.

      We have modified the introduction of these equations to avoid the problematic “where”.

      (20) Eq. 57 and 58 are really one equation. More importantly: where does Eq. 58 come from? Is this the H that was defined previously? Either way, you should make that clear.

      We have removed the problematic additional equation line number, and added a reference to where H comes from.

      (21) In Eq. 59 now Eq. 60 aren’t you taking the trace of a scalar? Seems like you could skip this.

      We have deleted this derivation, as it repeats material from the new Appendix 2.

      (22) Eq. 66 is exactly the same as Eq. 32. Which is a bit disconcerting. Are they different derivations of the same quantity? You should comment on this.

      We have deleted lots of the stuff in Appendix 5 as, we agree, it repeats material from Appendix 2 (which has been rewritten and considerably clarified).

      (23) Eq. 68 now 54, left column: please derive. we got:

      gai = gradient for weight i on trial

      where the second equality came from Eq. 20. Thus

      Is that correct? If so, it’s a lot to expect of the reader. Either way, a derivation would

      be helpful.

      We agree it was unnecessary and overly complex, so we have deleted it.

      (24) App 5–Figure 2: presumably the data for panel b came from Fig. 6a, with the learning rate set to Δw/w? And the data for panel c from Fig. 6b? This (or the correct statement, if this is wrong) should be mentioned.

      Yes, the data for panel c came from Fig. 6b. We have deleted the data in panel b, as there are some subtleties in interpretation of the learning rates in these settings.

      (25) line 952 now 946: typo, “and the from".

      Corrected to “and from".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) Authors need to acknowledge the physical effort in addition to visual information for the spatial coding and may consider the manipulation of physical efforts in the future to support the robustness of constant intrinsic bias in ground-based spatial coding during walking.

      Whether one’s physical effort can affect spatial coding for visual perception is not a settled issue.  Several empirical studies have not been able to obtain evidence to support the claim.  For example, empirical studies by Hutchison & Loomis (2009) and Durgin et al. (2009) did not find wearing a heavy backpack significantly influenced distance perception, in contrast to the findings by Proffitt et al (2003).  We respectfully request not to discuss this issue in our revision since it is not closely related to the focus of the current study.

      (2) Furthermore, it would be more comprehensive and fit into the Neuroscience Section if the authors can add in current understandings of the spatial reference frames in neuroscience in the introduction and discussion, and provide explanations on how the findings of this study supplement the physiological evidence that supports our spatial perception as well.  For instance, world-centered representations of the environment, or cognitive maps, are associated with hippocampal formation while self-centered spatial relationships, or image spaces, are associated with the parietal cortex (see Bottini, R., & Doeller, C. F. (2020). Knowledge Across Reference Frames: Cognitive Maps and Image Spaces. Trends in Cognitive Sciences, 24(8),606-619. https://doi.org/10.1016/j.tics.2020.05.008 for details)

      We have now added this important discussion in the revision on pages 12-13.

      We thank the reviewer for the helpful comments.

      Reviewer 2:

      (1) ….As a result, it is unclear to what extent this "allocentric" intrinsic bias is involved in our everyday spatial perception. To provide more context for the general audience, it would be beneficial for the authors to address this issue in their discussion.

      We have clarified this on pages 3-4.  In brief, our hypothesis is that during self-motion, the visual system constructs an allocentric ground surface representation (reference frame) by integrating the allocentric intrinsic bias with the external depth cues on the natural ground surface.  Supporting this hypothesis, we recently found that when there is texture cue on the ground, the representation of the ground surface is influenced by the allocentric intrinsic bias (Zhou et al, unpublished results).

      (2) The current findings on the "allocentric" coding scheme raise some intriguing questions as to why such a mechanism would be developed and how it could be beneficial. The finding that the "allocentric" coding scheme results in less accurate object localization and requires attentional resources seems counterintuitive and raises questions about its usefulness. However, this observation presents an opportunity for the manuscript to discuss the potential evolutionary advantages or trade-offs associated with this coding mechanism.

      The revision has discussed these important issues on page 12.

      (3) The manuscript lacks a thorough description of the data analysis process, particularly regarding the fitting of the intrinsic bias curve (e.g., the blue and gray dashed curve in Figure 3c) and the calculation of the horizontal separation between the curves. It would be beneficial for the authors to provide more detailed information on the specific function and parameters used in the fitting process and the formula used for the separation calculation to ensure the transparency and reproducibility of the study's results.

      The results of the statistical analysis were presented in the supplementary materials.  We had stated in the original manuscript that we fitted the intrinsic bias curve by eye (obtained by drawing the curve to transcribe the data points as closely as possible) (page 26).  This is because we do not yet have a formula for the intrinsic bias. A challenge is the measured intrinsic bias in the dark can be affected by multiple factors.  One factor is related to individual differences as the intrinsic bias is shaped by the observer’s past experiences and their eye height relative to the ground surface.  However, it is certainly our goal to develop a quantitative model of the intrinsic bias in the future.

      We thank the reviewer for the helpful comments.

      Reviewer 3:

      (1) I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      We respectively disagree with the Reviewer’s statement that “Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer.”  The statement conflates the definitions of allocentric representation with exocentric representation.  We respectfully maintain that the observer’s body location, as well as observer-object distance, can be represented with the allocentric coordinate system.

      (2) According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      We assume by “target size”, the Reviewer actually meant, “target location”.  It is correct that figure 3c and figure 4 showed judged distance changed as predicted, while the change in judged height was not significant.  One explanation for this is that the magnitude of the height change was much smaller than the distance change and could not be revealed by our blind walking-gesturing method.  Please also note our figures used difference scales for the vertical height and horizontal distance.

      (3) Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism?  Participants could not see the target's location but knew they had moved forward.  Therefore, their brain automatically compensates for this self-movement when judging the location of a target.  This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C.  A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      We assume the Reviewer suggested that the path-integration mechanism first estimates the traveled distance in the dark, and then the brain subtracts the estimated distance from the perceived target distance.  We respectfully maintain that this explanation is unlikely because it does not account for our empirical findings.  We found that walking in the dark did not uniformly affect perceived target distance, as the Reviewer’s explanation would predict.  As shown in figures 3 and 4, walking affected the near targets less than the far targets (i.e., the horizontal distance difference between walking and baseline-stationary conditions was smaller for the near target than far target).

      (4) According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

      We have added in the revision, the averaged eye positions on the y-axes of figures 3 and 4.  To reveal the impact of the judged angular declination, we also added graphs that plotted the estimated angular declination as a function of the physical declination of the target.  In general, the slopes are close to unity.

      We thank the reviewer for the helpful comments.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) This study is very well-designed and written. One minor comment is that anisotropy usually refers to the perceptual differences along cardinal (horizontal + vertical) and oblique directions. It might be clearer if the authors changed the "horizontal-vertical anisotropy" to "horizontal/vertical asymmetry”.

      The Reviewer is correct, and we have changed it to horizontal/vertical asymmetry (pages 8 and 11).

      Reviewer 2 (Recommendations For The Authors):

      (1) Providing more details about the "path integration mechanism" when it is first introduced in line 44 would be helpful for readers to better understand the concept.

      The revision has expanded on the path integration mechanism (page 4).

      Adding references for the statement starting with "In fact, previous findings" in lines 218 and would be helpful to provide readers with a basis for comparison between the current study and previous studies that reported an egocentric coding system.

      We have added the references and elaborated on this important issue (pages 10-11).

      (2) There appears to be a discrepancy between the Materials and Methods section, which states that 14 observers participated in Experiments 1-4, and the legends of Figures 3 and 4, which indicates a sample size of "n=8." It would be helpful if the authors could clarify this discrepancy and provide an explanation for the difference in the sample size reported.

      We have clarified the number of observers on page 14.

      (3) While reporting statistical significance is essential in the Results section, there are several instances where the manuscript only mentions a "statistically significant separation" with it p-value without providing the mean and standard deviation of the separation values (e.g., line 100 and 120). This can make it difficult for readers to fully grasp the quantitative nature of the results.

      The statistical analysis and outcomes were presented in the supplementary information document in our original submission.

      Reviewer 3 (Recommendations For The Authors):

      (1) Figure 1 is not significantly related to the current manuscript.

      We feel that retaining figure 1 in the manuscript would help readers to quickly grasp the background literature without having to refer extensively to our previous publications.

      (2) Add eye position to the results figures.

      We have added eye positions in the figures.

      (3) Fig 4c requires a more detailed explanation. The authors stated that Figures 4a and 4c showed consistent results.  However, because 4a and 4c used different horizontal axis, it is different to compare them directly.

      We have modified the sentence in the revision (page 8).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) In several instances the paper does not address apparent inconsistencies between the prior literature and the findings. For example, the first main finding is that recalled items have more differentiated lateral temporal cortex representations within lists than not recalled items. This seems to be the opposite of the prediction from temporal context models that are used to motivate the paper-context models would predict that greater contextual similarity within a list should lead to greater memory through enhanced temporal clustering in recall. This is what El-Kalliny et al (2019) found, using a highly similar design (free recall, intracranial recordings from the lateral temporal lobe). The authors never address this contradiction in any depth to reconcile it with the previous literature and with the motivating theoretical model. 

      Figure 2 supports the findings from El-Kalliny and colleagues because it shows the relationship of each list item relative to the first item (El-Kalliny et al. 2019). Items encoded adjacent to SP1 show the highest spectral similarity supporting the idea of overlapping context predicted by the Temporal Context Model. However, our figure characterizes how increasing inter-item distance affects spectral similarity. It shows that two items successfully recalled from temporally distant serial positions show reduced spectral similarity. These findings align with the predictions of the temporal context model because two temporally distant items would lack significant contextual overlap and therefore would have more distinct spectral representations.

      El-Kalliny and colleagues do use a similar experimental set-up however the authors define drift differently. They identified patients with a tendency to temporally cluster, and observed those patients tend to drift less between temporally clustered items however they do not specify drift relative to a constant serial position as we do in our analysis. They define drift as spectral change between two adjacent items which is a more relative measure between any two items rather than in relation to a fixed point like SP1. Finally, our analysis focuses only on gamma activity while El-Kalliny and colleagues identified drift across a much broader set of frequency bands.

      (2) The way that the authors conduct the analysis of medial parietal neural similarity at boundaries leads to results that cannot be conclusively interpreted. The authors report enhanced similarity across lists for the first item in each list, which they interpret as reflecting a qualitatively distinct boundary signal. However, this finding can readily be explained by contextual drift if one assumes that whatever happens at the start of each list is similar or identical across lists (for example, a get ready prompt or reminder of instructions). The authors do not include analyses to rule this out, which undermines one of the main findings. 

      Extensions of the temporal context model (Lohnas et al. 2015) predict context at the beginning of a list will be most similar to the end of the prior list. The theory assumes a single-context state, consisting of a recency-weighted average of prior items, that is updated, even across different encoding periods.

      However, our results show a boundary item representation is most similar to the prior lists first item rather than the last item. Our results conflict with the extension of TCM because the shared similarity of boundary items suggests the context state for the first item in the list is not a recency-weighted average of the items presented immediately prior. The same boundary sensitive signal is not present in other regions, namely the hippocampus and lateral temporal cortex. Those regions do not show similarity between items at the beginning of each list.  

      Our main conclusion from these data was that the medial parietal lobe activity seems to be specifically sensitive to task boundaries, defined by the first event or the get ready prompt, while other regions are not.

      (3) Although several previous studies have linked hippocampal fMRI and electrophysiological activity at event boundaries with memory performance, the authors do not find similar relationships between hippocampal activity, event boundaries, and memory There are potential explanations for why this might be the case, including the distinction between item vs. associative memory, which has been a prominent feature of previous work examining this question. However, the authors do not address these potential explanations (or others) to explain their findings' divergence from prior work -this makes it difficult to interpret and to draw conclusions from the data about the hippocampus' mechanistic role in forming event memories.

      The following text was added and revised in the discussion to discuss hippocampal activity shown in our results and its lack of sensitivity to boundaries.  

      “Spectral activity in the medial parietal lobe aligned closely with boundaries. Drift between item pairs seemed to reset at each boundary, leading to renewed similarity after each boundary. This observation aligns with previous work suggesting boundaries reset temporal context.  In the temporal cortex, our findings extend prior studies which suggest the temporal lobe may play a role in associating adjacently presented items (Yaffe et al. 2014, ElKalliny et al 2019). We found items encoded in distant serial positions, but within the same list, drifted significantly more than items from adjacent serial positions (Figure 2C). Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional to the time elapsed between them. However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ben-Yakov et al. 2018, Ezzyat et al.  2014; Griffiths et al. 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al. 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions.”

      (4) There is a similar absence of interpretation with respect to the previous literature for the data showing enhanced boundary-related similarity in the medial parietal cortex. The authors’ interpretation seems to be that they have identified a boundary-specific signal that reflects a large and abrupt change in context, however, another plausible interpretation is that enhanced similarity in the medial parietal cortex is related to a representation of a schema for the task structure that has been acquired across repeated instances. 

      We agree our results could suggest the MPL creates a generalized situational model or schematic of the task. Unfortunately, our behavioral task does not allow us to differentiate between these ideas and pure boundary representation. However, given boundaries are a component in defining situational models, we chose to interpret our results conservatively as a form of boundary representation.  

      (5) The authors do not directly compare their model to other models that could explain how variability in neural activity predicts memory. One example is the neural fatigue hypothesis, which the authors mention, however there are no analyses or data to suggest that their data is better fit by a boundary/contextual drift mechanism as opposed to neural fatigue. 

      The study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of nonrecalled items in all serial positions to demonstrate the lack of boundary representation in first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (6) P2. Line 65 cites Polyn et al (2009b) as an example where ‘random’ boundary insertions improve subsequent memory. However, the boundaries in that study always occurred at the same serial position and were therefore completely predictable and not random.

      The citation was removed from the corresponding sentence.

      (7) P2. Line 74 cites Pu et al. (2022) as an example of medial temporal lobe ‘regional activity’ showing sensitivity to event boundaries; however, this paper reported behavioral and computational modeling results and did not include measurement of neural activity. 

      The citation was removed from the corresponding sentence.

      (8) P.3 Line 117, Hseih et al (2014) and Hseih and Ranganath (2015) are cited as evidence that ‘spectral’ relatedness decreases as a function of distance, but neither of these studies examined ‘spectral’ activity (fMRI univariate and multivariate). The manuscript would benefit from a careful review and updating of how the prior literature is cited, which will increase the impact of the findings for readers. 

      The text has been updated to reflect this distinction by modifying the statement to:  “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (9) Several previous studies have found hippocampal activity at event boundaries correlates with memory performance (Ben-Yakov et al 2011, 2018; Baldassano et al 2017), yet here the authors do not find evidence for hippocampal activity at event boundaries related to memory. Does this difference reflect something important about how the hippocampus vs. medial parietal cortex vs. lateral temporal cortex contribute to memory formation? Currently, there is not much discussion about how to interpret the differences between brain regions. Previous work has suggested that hippocampal pattern similarity at event boundaries specifically supports associative memory across events (Ezzyat & Davachi, 2014; Griffiths & Fuentemilla, 2020; Heusser et al., 2016), which may help explain their findings. In any case the authors could increase the impact of their paper by further situating their findings within the previous literature. 

      We would not suggest there is no boundary-related activity in the hippocampus. Similar to an earlier point made by the reviewer, to clarify our interpretation of regional differences, the following text has been added to the discussion.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2020). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) The authors mention neural fatigue as an alternative theory to explain the primacy effect (Serruya et al., 2014), however there are no analyses or data to suggest that their data is better fit by a boundary mechanism as opposed to neural fatigue. Previous studies have shown that gamma activity in the hippocampus changes with serial position and with encoding history (Serruya et al 2014; Lohnas et al 2020). Here, the authors could compare the reported pattern similarity results to control analyses that replicate this prior work, which would strengthen their argument that there is unique information at boundaries that is distinct from a neural fatigue signal. 

      The serial position effects described by Serruya and colleagues describe decreasing HFA with increasing serial position in the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2014). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global neural fatigue model does not account for our results.

      Notably, the authors do not characterize HFA trends in the MPL. Nevertheless, their findings do not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.  

      Next, the neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2015). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (11) For the analyses that examine cross-list similarity (e.g. the medial parietal analysis in Figure 3), how did the authors choose the number of lists over which similarity was calculated? Was the selection of this free parameter cross-validated to ensure that it is not overfitting the data? Given that there were 25 lists per session, using the three succeeding lists seems arbitrary. Why not use every list across the whole session? 

      Given the volume of data, number of patients, and computational time available at our facility, we extended the analysis as far as we could to characterize the observed trend.

      (12) P4. Line 155 says that Figure 3C shows example subject data, but it looks like it is actually Figure 3D. 

      The text was updated to reference the correct figure.

      (13) The t-tests on P.4 Line 159 have two sets of degrees of freedom but should only have one. 

      The t-tests described by Figure 3B represent the mean parameter estimate of the predictor for boundary proximity contrasted by region for all item pairs. The statistical test in this case was an unpaired t-test between parameter estimates for patients with electrodes in each of the regions. The numbers within parentheses represent the sample size, or number of subjects, contributing electrodes to each region.

      Reviewer 2:

      (1) Because this is not a traditional event boundary study, the data are not ideally positioned to demonstrate boundary specific effects. In a typical study investigating event boundary effects, a series of stimuli are presented and within that series occurs an event boundary – for instance, a change in background color. The power of this design is that all aspects between stimuli are strictly controlled – in particular, the timing – meaning that the only difference between boundary-bridging items is the boundary itself. The current study was not designed in this manner, thus it is not possible to fully control for effects of time or that multiple boundaries occur between study lists (study to distractor, distractor to recall, recall to study). Each list in a free recall study can be considered its own “mini” experiment such that the same mechanisms should theoretically be recruited across any/all lists. There are multiple possible processes engaged at the start of a free recall study list which may not be specific to event boundaries per se. For example, and as cited by the authors, neural fatigue/attentional decline (and concurrent gamma power decline) may account for serial position effects. Thus, SP1 on all lists will be similar by virtue of the fact that attention/gamma decrease across serial position, which may or may not be a boundaryspecific effect. In an extreme example, the analyses currently reported could be performed on an independent dataset with the same design (e.g. 12 word delayed free recall) and such analyses could potentially reveal high similarity between SP1-list1 in the current study and SP1-list1 in the second dataset, effects which could not be specifically attributed to boundaries.

      The neural fatigue study by Lohnas and colleagues does find higher HFA was greater for recalled items but does not describe a serial position specific trend (Lohnas et al. 2020). For our study, we stringently controlled for recall success in each of our analyses. Our main finding of boundary similarity compares recalled boundary items to recalled items in each of the other serial positions. We also show the similarity of non-recalled items in all serial positions to demonstrate the lack of boundary representation in the first list items, when neural fatigue is presumably least present.

      In addition, their study demonstrated neural fatigue in the hippocampus. They did not find evidence of fatigue in the DLPFC, suggesting region-specific mechanisms of neural fatigue. Our results are focused on the medial parietal lobe, and we were not able to find a fatigue model of the region for further comparison. While our results do not rule out the possibility of neural fatigue driving a drifting or boundary signal, we focus on the relevance of the signal to memory performance.

      (2) Comparisons of recalled "pairs" does not account for the lag between those items during study or recall, which based on retrieved context theory and prior findings (e.g. Manning et al., 2011), should modulate similarity between item representations. Although the GLM will capture a linear trend, it will not reveal serial position specific effects. It appears that the betas reported for the SP12 analyses are driven by the fact that similarity with SP12 generally increases across serial position, rather a specific effect of "high similarity to SP12 in adjacent lists" (Page 5, excluding perhaps the comparison with list x+1). It is also unclear how the SP12 similarity analyses support the statement that "end-list items are represented more distinctly, or less similarly, to all succeeding items" (Page 5). It is not clear how the authors account for the fact that the same participants do not contribute equally to all ROIs or if the effects are consistent if only participants who have electrodes in all ROIs are included.

      In our study, all pairs are defined by the lag between a reference and target item. The results in Figure 3 show the similarity between each serial position in relation to SP1; Figure 4 shows lag between each serial position relative to SP2 and 3; and Figure 5 shows lag relative to SP12. Each statistical model accounts for the lag by ordering the data by increased inter-item distance. Further, our definition of lag is significantly more rigorous than that used by Manning and colleagues. Our similarity results for Figures 3-5 characterize the change in similarity relative to a constant reference point, such as SP1, rather than a relative reference point, such as +1 lag, which aggregates similarity between pairs such as SP1 to SP2 with SP4 to SP5, which maybe recalled via different memory mechanisms.  

      In Figure 5, we agree your characterization that ‘similarity with SP12 generally increases across serial position’ is a more accurate description of the trend. The text has been updated to reflect this by changing the interpretation to “later serial positions in adjacent lists shared a gradually increasing similarity to SP12.”  

      Next, we clarify the statement "end-list items are represented more distinctly, or less similarly, to all succeeding items". When recalling SP12, the subsequent items recalled exhibit significantly lower similarity to SP12 (see Figure 5D, pink). Consequently, the spectral representation of successfully recalled end-list items appears more distinct from later items in similar serial positions. This stands in contrast to our observations illustrated in Figures 3 and 4, where successfully recalled start-list items demonstrate greater similarity to later items in similar serial positions.

      (3) The authors use the term "perceptual" boundary which is confusing. First, "perceptual boundary" seems to be a specific subset of the broader term "event boundary," and it is unclear why/how the current study is investigating "perceptual" boundaries specifically. Second and relatedly, the current study does not have a sole "perceptual" boundary (as discussed in point 1 above), it is really a combination of perceptual and conceptual since the task is changing (from recalling the words in the previous list to studying the words in the current list OR studying the words in the current list to solving math problems in the current list) in addition to changes in stimulus presentation. 

      We agree with the statement that ‘perceptual’ as a modifier to the boundaries described here does not add significant information. Therefore, we have removed all reference to perceptual boundaries.

      (4) Although the results show that item-item similarity in the gamma band decreases across serial position, it is unclear how the present findings further describe "how gamma activity facilitates contextual associations" (Page 5). As mentioned in point 1 above, such effects could be driven by attentional declines across serial position -- and a concurrent decline in gamma power -- which may be unrelated to, and actually potentially impair, the formation of contextual associations, given evidence from the literature that increased gamma power facilitates binding processes.

      We agree that our study does not elucidate a mechanistic relationship between gamma power and contextual associations. The referenced sentence has been changed to: “how gamma activity is associated with context”.

      Please see our response to point 1 above. In addition, studies demonstrating decreasing gamma power with increasing serial position focus primarily on the MTL, lateral temporal cortex and prefrontal cortex (Serruya et al. 2012). Despite their findings, we do not observe a strong boundary effect in those regions (see Supp Fig 3 a,b). The lack of boundary effect in regions where HFA is selectively increased for primacy items suggests the global attentional decline or neural fatigue model does not account for our results.

      Notably, HFA trends in the MPL are poorly described. Further, gamma power decline does not rule out the possibility of a boundary effect driving the HFA. We demonstrate boundary-relevant HFA only in the MPL but not in other regions. In addition, we show a correlation between SP1 recalls and boundary representation strength, as well as a conserved similarity of multiple boundary-adjacent items.

      (5) Some of the logic and interpretations are inconsistent with the literature. For example, the authors state that "The temporal context model (TCM) suggests that gradual drift in item similarity provides context information to support recovery of individual items" however, this does not seem like an accurate characterization of TCM. According to TCM, context is a recency-weighted average of previous experience. Context "drifts" insofar as information is added to/removed from context. Context drift thus influences item similarity -- it is not that item similarity itself drifts, but that any change in item-item similarity is due to context drift. 

      The current findings do not appear at odds with the conceptualization of drift and context in current version of the context maintenance and retrieval model. Furthermore, the context representation is posited to include information beyond basic item representations. Two items, regardless of their temporal distance, can be associated with similar contexts if related information is included in both context representations, as predicted and shown for multiple forms of relatedness including semantic relatedness (Manning & Kahana, 2012) and task relatedness (Polyn et al., 2012).

      We revised the sentence and encompassing paragraph to describe the temporal context model more accurately and emphasize how our findings align with the stated version of CMR. The revised text is below:  

      “Next, we asked how gamma spectral activity reflects contextual association between items. In the medial parietal lobe, we observed recurring similarity between items distant in time but adjacent to boundaries. This pattern suggests spectral activity may carry information about an item's relationship to a boundary. These observations align with the Context Maintenance and Retrieval model which extends the predictions of TCM to encompass broader relationships among items. Our results demonstrate boundaries as an important aspect of context and specify the spectral and regional properties of these boundary-related contextual features.”

      (6) Lohnas et al. (2020) Neural fatigue influences memory encoding in the human hippocampus, Neuropsychologia, should be cited when discussing neural fatigue

      Thank you for your suggestion. The citation has been added to the text.

      (7) A within-list, not an across list, similarity analysis should be used to test the interpretation that end-of-list items are more distinct than other list items.

      We believe this recommendation refers to the following line in our text: “These findings suggest end-list items are represented more distinctly, or less similarly, to all succeeding items.” Our statement compares list x, SP12 to all succeeding items (in list x+1, x+2, etc.). Therefore, this statement refers to items in the next lists which is why we performed an across list analysis rather than within-list one.

      (8) It is unclear why it is necessary to use PCA to estimate similarity between items.

      PCA was used to reduce the dimensionality of the time-frequency matrix for the gamma band. This technique allowed us to compare predominant trends in gamma between items. In addition, we added a figure showing 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (9) Lags are listed as -4, 4 (Page 8), however with a list length of 12, possible lags should be 11, 11.

      The listed parenthetical statement ‘(-4 to 4)’ referred to Figure 1 where Lag CRP is shown for transitions from -4 to 4. However, we did calculate lag CRP for all possible transitions. Therefore, the referenced phrase was changed to: “Lagged CRP was calculated for all possible transitions (-11 to 11).”

      (10) Hsieh et al. 2014 and Hsieh & Ranganath (2015) are fMRI studies and as such, do not support the statement "Previous work consistent with temporal context models suggests spectral relatedness reduces as a function of distance between words" (Page 3). 

      The statement has been revised to: “Previous work consistent with temporal context models suggests neural pattern similarity reduces as a function of distance between related memories.”

      (11) Although statistically one can measure "How item-item similarity is affected by recollection" (Page 3), this is logically backwards, given that similarity during study necessarily precedes performance during free recall. Additionally, it is erroneous to assume that recalled words are "recollected" without additional measurements (e.g. Mickes et al. (2013) Rethinking familiarity: Remember/Know judgments in free recall, JML).

      The statement was changed to “item-item similarity is affected based on successful recall” given recollection cannot be determined in our paradigm.

      Reviewer 3:

      (1) My primary confusion in the current version of this paper is that the analyses don't seem to directly compare the two proposed models illustrated in Fig 1B, i.e. the temporal context model (with smooth drifts between items, including across lists) versus the boundary model (with similarities across all lists for items near boundaries). After examining smooth drift in the within-list analysis (Fig 2), the across-list analyses (Figs 3-5) use a model with two predictors (boundary proximity and list distance), neither of which is a smoothlydrifting context. Therefore there does not appear to be a quantitative analysis supporting the conclusion that in lateral temporal cortex "drift exhibits a relationship with elapsed time regardless of the presences of intervening boundaries" (lines 272-3).

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists.

      However, we agree with the comment that the presented data does not directly support the lateral temporal cortex drifts independent of intervening boundaries. Therefore, we amended the statement to: “We found successfully recalled items encoded in distant serial positions drifted significantly more than items from adjacent serial positions (Figure 2C)”. Consistent with the predictions of the temporal context model, the reduced similarity between distant items may reflect reduced contextual overlap proportional time elapsed between them.”

      (2) The feature representation used for the neural response to each item is a gamma power time-frequency matrix. This makes it unclear what characteristics of the neural response are driving the observed similarity effects. It appears that a simple overall scaling of the response after boundaries (stronger responses to initial items during the beginning portion of the 1.6s time window) would lead to the increased cosine similarity between initial items, but wouldn't necessarily reflect meaningful differences in the neural representation or context of these items.

      Our study aims to draw the connection between the neural response after boundaries with neural representation and context of these items. Prior studies (Manning et al. 2011, El Kalliny et al. 2017) have interpreted similarity in neural spectra as a memory relevant phenomenon. We use very similar methods to perform our analysis.  

      In addition, we compare the fit of our boundary similarity model to behavioral performance to show increased boundary representation correlates with improved boundary item recall.

      While our study does not specify which time-frequency components underly the increased similarity, we do limit our analysis to the gamma band. Traditional analyses include log-scaled, broadband time-frequency data (eg. 3-100hz) from which we specify the relevance of a much narrower spectral band.  

      Finally, we tried to study which time–frequency components contributed to the increased similarity, but it varied greatly between patients (see Figure 3 – supplementary figure 2D). Hence, we opted to use principal component analyses to compare the features showing the most variation for each given participant. This added analytical step allows us to detect boundary effects across patients despite individual variability in boundary representation.

      (3) The specific form of the boundary proximity models is not well justified. For initial items, a model of e^(1-d) is used (with d being serial position), but it is not stated how the falloff scale of this model was selected (as opposed to e.g. e^((1-d)/2)). For final items, a different model of d/#items is used, which seems to have a somewhat different interpretation (about drift between boundaries, rather than an effect specific to items near a final boundary). The schematic in Fig 1B appears to show a hypothesis which is not tested, with symmetric effects at initial and final boundaries.

      The boundary proximity models were chosen empirically. Our model was intended to quantify a decreasing relationship across many patients. We acknowledge the constants and variables may not definitively describe underlying neural processes.  

      For start- and end-list boundaries, we used different models because primacy and recency effects are unique phenomena. Primacy memory is classically thought to arise from rehearsal during the encoding time (Polyn et al. 2009, Lohnas et al. 2015). Alternatively, recency memory is thought to arise from strong contextual cues of recency items during recall due to their temporal proximity. Therefore, we have a limited basis on which to assume their spectral representation in relation to task boundaries would be symmetric.

      (4) The main text description of Fig 2 only describes drift effects in lateral temporal cortex, but Fig 2 - supplement 1 shows that there is also drift and a significant subsequent memory effect in the other two ROIs as well. There is not a significant memory x drift slope interaction in these regions; are the authors arguing that the lack of this interaction (different drift rates for remembered versus forgotten items) is critical for interpreting the roles of lateral temporal cortex versus medial parietal and hippocampal regions?

      Yes. Fig 2- Supplement 1 shows that drift occurs in both the HC and MPL. However, the interaction term is not significant, which suggests that the rate of drift between recalled and non-recalled items is not significantly different.  

      In contrast, Fig 2C shows that recalled pairs drift at a higher rate than non-recalled pairs. For the LTC, the interaction term is negative in magnitude and statistically significant. This suggests successfully encoded item pairs encoded far apart share more distinct spectral representations, specifically in the LTC. These findings lead to our interpretation in the discussion that “elevated drift rate might allow the representations of recalled items to remain distinct but ordered in memory.”

      (5) The parameter fits for the "list distance" regressor are not shown or analyzed, though they do appear to be important for the observed similarity structure (e.g. Fig 3E). I would interpret this regressor as also being "boundary-related" in the sense that it assumes discrete changes in similarity at boundaries.

      Parameter fits for the ‘list distance’ regressor are now shown in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant.

      (6) To make strong claims about temporal context versus boundary models as implied by Fig 1B, these two regressors should be fit within the same model to explain across-list similarity. The temporal context model could be based on the number of intervening items (as in Fig 1B) or actual time elapsed between items. The relationship between the smoothly drifting temporal context model and the discretely-jumping list distance models should also be clarified.

      We could not use a smoothly drifting regressor due to its collinearity with any model of boundary similarity. A model which included a ‘temporal context regressor’ would not be able to account for the presence of a boundary effect and would not allow us to demonstrate a boundary representation in the presence of drift. Therefore, we chose our two regressors: boundary proximity, which models intra-list changes in similarity and list distance, which models a stepwise decrease in similarity from adjacent lists. These regressors allow the model to differentiate between intra-list changes (the boundary regressor) verses inter-list changes (the list distance regressor).  

      (7) The features of the time-frequency matrix that are driving similarity between events could be visualized to provide a better understanding of the boundary-related signals. The analysis could also be re-run with reduced versions of the feature space in order to determine the critical components of this signal; for example, responses could be averaged across time to examine only differences across frequencies, or across frequencies to examine purely temporal changes across the 1.6 second window.

      Figure 3 – supplementary figure 2 A-C has been added to show varying the number of principal components (PCs) does not change the trend of boundary sensitivity in the MPL. In addition, we included 3 example subjects in Figure 3 – supplementary figure 2D to show unique time-frequency components contribute to signal reconstructed from the PCs for each subject. Therefore, the boundary representation may be represented differently for each patient.

      (8) If the authors are considering a space of multiple models as "boundary proximity models" (e.g. linear models and exponential models with different scale factors), this should be part of the model-fitting process rather than a single model being selected posthoc.

      We agree with the reviewer’s suggestion that the most ideal way to fit a model to the trend would be using a model-fitting process. However, due to a limitation on the amount of computational resources available, we were not able to perform it given the size of our dataset.

      (9) The interpretation of region differences in the results in Fig 2 and Fig 2 - supplement 1 should be clarified. 

      In discussion, we have added the following text to clarify our interpretation of the regional differences shown in the mentioned figures.  

      “However, across task boundaries, our study did not detect a robust change in drift rate in the medial or lateral temporal cortex. This finding contrasts with significant work (Ezzyat and Davachi, 2014; Griffiths and Fuentemilla, 2020) which shows hippocampal sensitivity to event-boundaries. One interpretation would be that boundary representations in the hippocampus are quite sparse and represented by populations of time-sensitive cells whose activity is indexed to task-related boundaries (Umbach et al 2018). While the sparse representations may not be detectable in gamma activity, perhaps it suggests drift in these regions represents a more abstract set of contextual features accumulated from multiple brain regions (Baldassano et al. 2017). “

      (10) Whether there are significant fits for the list distance regressor, and whether these fits vary across regions, could be stated. The list distance regressor could also be directly compared (in the same model) to a temporal-context regressor, which predicts graded changes in similarity between items rather than the discrete changes between lists.

      We have added parameter fits for the ‘list distance’ regressor in the supplementary portion of Figures 3 and Figure 5. The difference between regions is non-significant. Therefore, our results show very similar stepwise decrease in similarity across lists between regions (list distance regressor; Figure 3 —supplementary figure 1B).

      We could not compare these parameters to a separate model which includes a smoothly drifting ‘temporal-context’ regressor due to the regressors collinearity with any representation of boundary. See our response to Reviewer 3 –comment 6.  

      (11) The authors should clarify their interpretation of the results, and whether they are proposing a tweak to the temporal context model or a substantially different organizational system. 

      In the disucssion we include the following statements to clarify what we suggest regarding the temporal context model.  

      “Our findings suggest a broader scope of contextual association than just prior items, where temporal proximity as well as task structure in the form of boundaries, play intertwined roles in contextual construction. Our data therefore have implications for updated iterations of the temporal context model incorporating (perhaps) specific terms for boundary information. This may in turn provide a more systematic prediction of primacy effects in behavioral data.”  

      (12) Minor typos and corrections: 

      52: using -> use 

      108: patients -> patients'  156: list -> lists 

      The list distance plot is described as "pink" in Fig 3 and Fig 5 - supplement 1, but appears gray in the figures.

      Each of these corrections has been corrected in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) For a number of experiments the authors use their new data set on females and compare that with the data set previously published on males. In how far are these data sets comparable? Have they been performed originally in parallel for example using siblings of different sexes or have the experiments been conducted several years apart from each other? What is the expected variability, if one repeated these experiments with the same sex considering the differences/similarities between experimental setups, housing conditions, interindividual differences, etc.? 

      This is an important point. We did our best to collect the data in similar conditions (same set-ups; same animal housing conditions) and in experimental cohorts including both males and females. While some data from males were published first, the acquisition of male and female data was done in the same time period.

      Specifically, all results shown in Figure 1 and Figure 2 (Serum leptin, PPARalpha, AMPK, RNAseq) come from samples (from both males and females) that were processed at the same time and in similar conditions, by the same authors (Z.P. and P. M.).

      For the in vivo data (Figure 3, Supplementary figure 1), the male and female data were collected within a 1–2-year timeframe, in the same setups, by the same two authors (Z.P., D.K.). The males and females were housed under similar conditions (same room, same cage type, in groups of 25). We did not use siblings of different sexes. Independent cohorts (1-12 months apart), including both males and females, went into each data set. The within cohort variability does not obviously differ from between cohort variability, however the n number of animals is too small to confirm this with sufficient statistical power. 

      Altogether, the differences observed between male and female data cannot be explained by the timing and conditions of data acquisition from both sexes.

      (2) Energy consumption and visual processing may differ between periods in which animals are in different behavioral states. Is there a possibility that male and female mice differed in behavioral state during measurements? Were animals running or resting during visual stimulation and during ATP measurements? 

      We thank the reviewer for this suggestion. We have now edited the text and included a new supplementary figure. All in vivo experiments were done in stationary animals that were resting in a cardboard tube both during 2-photon imaging and ATP measurements. Animals were also well habituated to the setup. In addition, we have imaged pupil diameters during in vivo imaging session. We have quantified pupil diameter during visual stimulation and do not find a sex difference (Supplemental Figure 2). Thus, we did not find a significant difference in behavioural or attentional state between sexes, in our experimental conditions.

      We have edited the text to include this information (lines 183-185).

      (3) Related to the previous point: the authors show that ATP consumption was reduced in male mice during visual stimulation. What about visual cortex ATP consumption in the absence of visual stimulation? Do food-deprived males and/or females show lower ATP consumption in the visual cortex e.g. during sleep? 

      We have repeated V1 ATP imaging experiments in the dark, in the absence of visual stimulation, in both males and females (Supplementary figure 1). ATP consumption rates are slower in the dark vs. during visual stimulation. Moreover, we find that in the dark, there is no difference in ATP consumption rate between control and food restricted animals of either sex. Thus, the reduced ATP consumption we found with food restriction in males is related specifically to the active processing of visual information.

      We have edited the text to include this information (lines 158-159).

      Reviewer 2:

      (1) It appears that the authors have the data for doing decoding analysis, similar to Fig 6D in their previous paper. However, this analysis has not been done for this study. This would be good to include.  If the authors have attempted the behavioural discrimination tests on female mice as in the previous study, this would also be useful to include. 

      The first point of the reviewer is about datasets acquired in males that are included in our previous publication (Padamsey et al., 2022) but not compared to female data in the present manuscript.

      Whilst we fully agree that these results would be very useful, we did not have the resources (in terms of skilled researcher and funding) to perform these experiments in female mice. That is why these results are not included in this manuscript.

      (2) There appears to be an inconsistency in the methods of reporting OSI. It states that the OSI of grating-responsive neurons was calculated as 1 - circular variance. But then OSI is defined as simply abs(). Also, it would be good to be consistent about reporting medians as the median without confounding with the average (which is the mean). Sentences such as the following do not make sense: The average OSI for an animal was taken as the median OSI value calculated across neurons. This should be corrected throughout the manuscript, where the average is mentioned but the median is measured. 

      We thank the reviewer for noting this issue and we apologize for the confusion. We have now clarified the above in the manuscript (lines 587-603) and insert the following reference for the detailed explanation of OSI and DSI calculation: Mazurek M, Kager M, Van Hooser SD. Robust quantification of orientation selectivity and direction selectivity. Front Neural Circuits. 2014. https://doi.org/10.3389/fncir.2014.00092

      In the figure showing the orientation tuning, the authors have collapsed the two directions of each orientation together. However, if I understand correctly, the calculation of OSI does not do this step of collapsing. In this case, and in the interest of revealing more useful features of the data instead of averaging them out, it would be good to show the average tuning curves with and without FR for all directions, not collapsed. 

      As with orientation tuning, we found that direction tuning is reduced with food restriction, and that this is significant in males, but not in females. These results are now included in the text, with statistics (lines 179-180) and in Supplemental Figure 3.

      Reviewer 3:

      l. 183-187 The discussion based on the idea that "The Bayes factor analysis helps to differentiate the absence of evidence from the evidence of absence." does not seem very helpful. Using a statistical criterium makes less sense than providing the reader with an estimate largest effect size (if there is any) that is compatible with the observation. If there would be a significant effect but of a very small size would it change the authors' conclusion? That seems unlikely. I recommend removing the sentence on line 184, which is in fact not used afterwards. 

      We agree with the reviewer. We have now removed the sentence and rephrased (lines 202-208).  

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We now provide exact p-values alongside the summary statistics (test statistic and df) and 95% confidence intervals for all key results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have specifically addressed the points of uncertainty highlighted in eLife's editorial assessment, which concerned the lack of low-level acoustics control, limitations of experimental design, and in-depth analysis. Regarding “the lack of low-level acoustics control, limitations of experimental design”, in response to Reviewer #1, we clarify that our study aimed to provide a broad perspective —which includes both auditory and higher-level processes— on the similarities and distinctions in processing natural speech and music within an ecological context. Regarding “the lack of in-depth analysis”, in response to Reviewer #1 and #2, we have clarified that while model-based analyzes are valuable, they pose fundamental challenges when comparing speech and music. Non-acoustic features inherently differ between speech and music (such as phonemes and pitch), making direct comparisons reliant on somewhat arbitrary choices. Our approach mitigates this challenge by analyzing the entire neural signal, thereby avoiding potential pitfalls associated with encoding models of non-comparable features. Finally, we provide some additional analyzes suggested by the Reviewers.

      We sincerely appreciate your thoughtful and thorough consideration throughout the review process.

      eLife assessment

      This study presents valuable intracranial findings on how two important types of natural auditory stimuli - speech and music - are processed in the human brain, and demonstrates that speech and music largely share network-level brain activities, thus challenging the domain-specific processing view. The evidence supporting the claims of the authors is solid but somewhat incomplete since although the data analysis is thorough, the results are robust and the stimuli have ecological validity, important considerations such as low-level acoustics control, limitations of experimental design, and in-depth analysis, are lacking. The work will be of broad interest to speech and music researchers as well as cognitive scientists in general.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors examined the extent to which the processing of speech and music depends on neural networks that are either specific to a domain or general in nature. They conducted comprehensive intracranial EEG recordings on 18 epilepsy patients as they listened to natural, continuous forms of speech and music. This enabled an exploration of brain activity at both the frequency-specific and network levels across a broad spectrum. Utilizing statistical methods, the researchers classified neural responses to auditory stimuli into categories of shared, preferred, and domain-selective types. It was observed that a significant portion of both focal and network-level brain activity is commonly shared between the processing of speech and music. However, neural responses that are selectively responsive to speech or music are confined to distributed, frequency-specific areas. The authors highlight the crucial role of using natural auditory stimuli in research and the need to explore the extensive spectral characteristics inherent in the processing of speech and music.

      Strengths:

      The study's strengths include its high-quality sEEG data from a substantial number of patients, covering a majority of brain regions. This extensive cortical coverage grants the authors the ability to address their research questions with high spatial resolution, marking an advantage over previous studies. They performed thorough analyses across the entire cortical coverage and a wide frequency range of neural signals. The primary analyses, including spectral analysis, temporal response function calculation, and connectivity analysis, are presented straightforwardly. These analyses, as well as figures, innovatively display how neural responses, in each frequency band and region/electrode, are 'selective' (according to the authors' definition) to speech or music stimuli. The findings are summarized in a manner that efficiently communicates information to readers. This research offers valuable insights into the cortical selectivity of speech and music processing, making it a noteworthy reference for those interested in this field. Overall, this research offers a valuable dataset and carries out extensive yet clear analyses, amounting to an impressive empirical investigation into the cortical selectivity of speech and music. It is recommended for readers who are keen on understanding the nuances of selectivity and generality in the processing of speech and music to refer to this study's data and its summarized findings.

      Weaknesses:

      The weakness of this study, in my view, lies in its experimental design and reasoning:

      (1) Despite using longer stimuli, the study does not significantly enhance ecological validity compared to previous research. The analyses treat these long speech and music stimuli as stationary signals, overlooking their intricate musical or linguistic structural details and temporal variation across local structures like sentences and phrases. In previous studies, short, less ecological segments of music were used, maintaining consistency in content and structure. However, this study, despite employing longer stimuli, does not distinguish between neural responses to the varied contents or structures within speech and music. Understanding the implications of long-term analyses, such as spectral and connectivity analyses over extended periods of around 10 minutes, becomes challenging when they do not account for the variable, sometimes quasi-periodical or even non-periodical, elements present in natural speech and music. When contrasting this study with prior research and highlighting its advantages, a more balanced perspective would have been beneficial in the manuscript.

      Regarding ecological validity, we respectfully hold a differing perspective from the reviewer. In our view, a one-second music stimulus lacks ecological validity, as real-world music always extends much beyond such a brief duration. While we acknowledge the trade-off in selecting longer stimuli, limiting the diversity of musical styles, we maintain that only long stimuli afford participants an authentic musical listening experience. Conversely, shorter stimuli may lead participants to merely "skip through" musical excerpts rather than engage in genuine listening.

      Regarding the critique that we "did not distinguish between neural responses to the varied contents or structures within speech and music," we partly concur. Our TRF (temporal response function) analyzes incorporate acoustic content, particularly the acoustic envelope, thereby addressing this concern to some extent. However, it is accurate to note that we did not model non-acoustic features. In acknowledging this limitation, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      Finally, we did take into account the reviewer’s remark and did our best to give a more balanced perspective of our approach and previous studies in the discussion.

      “While listening to natural speech and music rests on cognitively relevant neural processes, our analytical approach, extending over a rather long period of time, does not allow to directly isolate specific brain operations. Computational models -which can be as diverse as acoustic (Chi et al., 2005), cognitive (Giordano et al., 2021), information-theoretic (Di Liberto et al., 2020), or self-supervised neural network (Donhauser & Baillet, 2019 ; Millet et al., 2022) models- are hence necessary to further our understanding of the type of computations performed by our reported frequency-specific distributed networks. Moreover, incorporating models accounting for musical and linguistic structure can help us avoid misattributing differences between speech and music driven by unmatched sensitivity factors (e.g., arousal, emotion, or attention) as inherent speech or music selectivity (Mas-Herrero et al., 2013; Nantais & Schellenberg, 1999).”

      (2) In contrast to previous studies that employed short stimulus segments along with various control stimuli to ensure that observed selectivity for speech or music was not merely due to low-level acoustic properties, this study used longer, ecological stimuli. However, the control stimuli used in this study, such as tone or syllable sequences, do not align with the low-level acoustic properties of the speech and music stimuli. This mismatch raises concerns that the differences or selectivity between speech and music observed in this study might be attributable to these basic acoustic characteristics rather than to more complex processing factors specific to speech or music.

      We acknowledge the reviewer's concern. Indeed, speech and music differ on various levels, including acoustic and cognitive aspects, and our analyzes do not explicitly distinguish them. The aim of this study was to provide an overview of the similarities and differences between natural speech and music processing, in ecological context. Future work is needed to explore further the different hierarchical levels or networks composing such listening experiences. Of note, however, we report whole-brain results with high spatial resolution (thanks to iEEG recordings), enabling the distinction between auditory, superior temporal gyrus (STG), and higher-level responses. Our findings clearly highlight that both auditory and higher-level regions predominantly exhibit shared responses, challenging the interpretation that our results can be attributed solely to differences in 'basic acoustic characteristics'.

      We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The concept of selectivity - shared, preferred, and domain-selective - increases the risks of potentially overgeneralized interpretations and theoretical inaccuracies. The authors' categorization of neural sites/regions as shared, preferred, or domain-selective regarding speech and music processing essentially resembles a traditional ANOVA test with post hoc analysis. While this categorization gives meaningful context to the results, the mere presence of significant differences among control stimuli, a segment of speech, and a piece of music does not necessarily imply that a region is specifically selective to a type of stimulus like speech. The manuscript's narrative might lead to an overgeneralized interpretation that their findings apply broadly to speech or music. However, identifying differences in neural responses to a few sets of specific stimuli in one brain region does not robustly support such a generalization. This is because speech and music are inherently diverse, and specificity often relates more to the underlying functions than to observed neural responses to a limited number of examples of a stimulus type. See the next point.

      Exactly! Here, we present a precise operational definition of these terms, implemented with clear and rigorous statistical methods. It is important to note that in many cognitive neuroscience studies, the term "selective" is often used without a clear definition. By establishing operational definitions, we identified three distinct categories based on statistical testing of differences from baseline and between conditions. This approach provides a framework for more accurate interpretation of experimental findings, as now better outlined in the introduction:

      “Finally, we suggest that terms should be operationally defined based on statistical tests, which results in a clear distinction between shared, selective, and preferred activity. That is, be A and B two investigated cognitive functions, “shared” would be a neural population that (compared to a baseline) significantly and equally contributes to the processing of both A and B; “selective” would be a neural population that exclusively contributes to the processing of A or B (e.g. significant for A but not B); and “preferred” would be a neural population that significantly contributes to the processing of both A and B, but more prominently for A or B (Figure 1A).”

      Regarding the risk of over-generalization, we want to clarify that our manuscript does not claim that a specific region or frequency band is selective to speech or music. As indeed we focus on testing excerpts of speech and music, we employ the reverse logical reasoning: "if 10 minutes of instrumental music activates a region traditionally associated with speech selectivity, we can conclude that this region is NOT speech-selective." Our conclusions revolve around the absence of selectivity rather than the presence of selective areas or frequency bands. In essence, "one counterexample is enough to disprove a theory." We now further elaborated on this point in the discussion section:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyzes. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      (4) The authors' approach, akin to mapping a 'receptive field' by correlating stimulus properties with neural responses to ascertain functional selectivity for speech and music, presents issues. For instance, in the cochlea, different stimuli activate different parts of the basilar membrane due to the distinct spectral contents of speech and music, with each part being selective to certain frequencies. However, this phenomenon reflects the frequency selectivity of the basilar membrane - an important function, not an inherent selectivity for speech or music. Similarly, if cortical regions exhibit heightened responses to one type of stimulus over another, it doesn't automatically imply selectivity or preference for that stimulus. The explanation could lie in functional aspects, such as a region's sensitivity to temporal units of a specific duration, be it music, speech, or even movie segments, and its role in chunking such units (e.g., around 500 ms), which might be more prevalent in music than in speech, or vice versa in the current study. This study does not delve into the functional mechanisms of how speech and music are processed across different musical or linguistic hierarchical levels but merely demonstrates differences in neural responses to various stimuli over a 10-minute span.

      We completely agree with the last statement, as our primary goal was not to investigate the functional mechanisms underlying speech and music processing. However, the finding of a substantial portion of the cortical network as being shared between the two domains constrains our understanding of the underlying common operations. Regarding the initial part of the comment, we would like to clarify that in the framework we propose, if cortical regions show heightened responses to one type of stimulus over another, this falls into the ‘preferred’ category. The ‘selective’ (exclusive) category, on the other hand, would require that the region be unresponsive to one of the two stimuli.

      Reviewer #2 (Public Review):

      Summary:

      The study investigates whether speech and music processing involve specific or shared brain networks. Using intracranial EEG recordings from 18 epilepsy patients, it examines neural responses to speech and music. The authors found that most neural activity is shared between speech and music processing, without specific regional brain selectivity. Furthermore, domain-selective responses to speech or music are limited to frequency-specific coherent oscillations. The findings challenge the notion of anatomically distinct regions for different cognitive functions in the auditory process.

      Strengths:

      (1) This study uses a relatively large corpus of intracranial EEG data, which provides high spatiotemporal resolution neural recordings, allowing for more precise and dynamic analysis of brain responses. The use of continuous speech and music enhances ecological validity compared to artificial or segmented stimuli.

      (2) This study uses multiple frequency bands in addition to just high-frequency activity (HFA), which has been the focus of many existing studies in the literature. This allows for a more comprehensive analysis of neural processing across the entire spectrum. The heterogeneity across different frequency bands also indicates that different frequency components of the neural activity may reflect different underlying neural computations.

      (3) This study also adds empirical evidence towards distributed representation versus domain-specificity. It challenges the traditional view of highly specialized, anatomically distinct regions for different cognitive functions. Instead, the study suggests a more integrated and overlapping neural network for processing complex stimuli like speech and music.

      Weaknesses:

      While this study is overall convincing, there are still some weaknesses in the methods and analyses that limit the implication of the work.

      The study's main approach, focusing primarily on the grand comparison of response amplitudes between speech and music, may overlook intricate details in neural coding. Speech and music are not entirely orthogonal with each other at different levels of analysis: at the high-level abstraction, these are two different categories of cognitive processes; at the low-level acoustics, they overlap a lot; at intermediate levels, they may also share similar features. The selected musical stimuli, incorporating both vocals and multiple instrumental sounds, raise questions about the specificity of neural activation. For instance, it's unclear if the vocal elements in music and speech engage identical neural circuits. Additionally, the study doesn't adequately address whether purely melodic elements in music correlate with intonations in speech at a neural level. A more granular analysis, dissecting stimuli into distinct features like pitch, phonetics, timbre, and linguistic elements, could unveil more nuanced shared, and unique neural processes between speech and music. Prior research indicates potential overlap in neural coding for certain intermediate features in speech and music (Sankaran et al. 2023), suggesting that a simple averaged response comparison might not fully capture the complexity of neural encoding. Further delineation of phonetic, melodic, linguistic, and other coding, along with an analysis of how different informational aspects (phonetic, linguistic, melodic, etc) are represented in shared neural activities, could enhance our understanding of these processes and strengthen the study's conclusions.

      We appreciate the reviewer's acknowledgment that delving into the intricate details of neural coding of speech and music was beyond the scope of this work. To address some of the more precise issues raised, we have clarified in the manuscript that our musical stimuli do not contain vocals and are purely instrumental. We apologize if this was not clear initially.

      “In the main experimental session, patients passively listened to ~10 minutes of storytelling (Gripari, 2004); 577 secs, La sorcière de la rue Mouffetard, (Gripari, 2004) and ~10 minutes of instrumental music (580 secs, Reflejos del Sur, (Oneness, 2006) separated by 3 minutes of rest.”

      Furthermore, we now acknowledge the importance of modeling melodic, phonetic, or linguistic features in the discussion, and we have referenced the work of Sankaran et al. (2024) and McCarty et al. (2023) in this regard. However, we would like to share an additional thought with the reviewer regarding model comparison for speech and music. Specifically, comparing results from a phonetic (or syntactic) model of speech to a pitch-melodic (or harmonic) model for music is not straightforward, as these models operate on fundamentally different dimensions. In other words, while assuming equivalence between phonemes and pitches may be a reasonable assumption, it in essence relies on a somewhat arbitrary choice. Consequently, comparing and interpreting neuronal population coding for one or the other model remains problematic. In summary, because the models for speech and music are different (except for acoustic models), direct comparison is challenging, although still commendable and of interest.

      “These selective responses, not visible in primary cortical regions, seem independent of both low-level acoustic features and higher-order linguistic meaning (Norman-Haignere et al., 2015), and could subtend intermediate representations (Giordano et al., 2023) such as domain-dependent predictions (McCarty et al., 2023; Sankaran et al., 2023).”

      References:

      McCarty, M. J., Murphy, E., Scherschligt, X., Woolnough, O., Morse, C. W., Snyder, K., Mahon, B. Z., & Tandon, N. (2023). Intraoperative cortical localization of music and language reveals signatures of structural complexity in posterior temporal cortex. iScience, 26(7), 107223.

      Sankaran, N., Leonard, M. K., Theunissen, F., & Chang, E. F. (2023). Encoding of melody in the human auditory cortex. bioRxiv. https://doi.org/10.1101/2023.10.17.562771

      The paper's emphasis on shared and overlapping neural activity, as observed through sEEG electrodes, provides valuable insights. It is probably true that domain-specificity for speech and music does not exist at such a macro scale. However, it's important to consider that each electrode records from a large neuronal population, encompassing thousands of neurons. This broad recording scope might mask more granular, non-overlapping feature representations at the single neuron level. Thus, while the study suggests shared neural underpinnings for speech and music perception at a macroscopic level, it cannot definitively rule out the possibility of distinct, non-overlapping neural representations at the microscale of local neuronal circuits for features that are distinctly associated with speech and music. This distinction is crucial for fully understanding the neural mechanisms underlying speech and music perception that merit future endeavors with more advanced large-scale neuronal recordings.

      We appreciate the reviewer's concern, but we do not view this as a weakness for our study's purpose. Every method inherently has limitations, and intracranial recordings currently offer the best possible spatial specificity and temporal resolution for studying the human brain. Studying cell assemblies thoroughly in humans is ethically challenging, and examining speech and music in non-human primates or rats raises questions about cross-species analogy. Therefore, despite its limitations, we believe intracranial recording remains the best option for addressing these questions in humans.

      Regarding the granularity of neural representation, while understanding how computations occur in the central nervous system is crucial, we question whether the single neuron scale provides the most informative insights. The single neuron approach seem more versatile (e.g., in term of cell type or layer affiliation) than the local circuitry they contribute to, which appears to be the brain's building blocks (e.g., like the laminar organization; see Mendoza-Halliday et al.,2024). Additionally, the population dynamics of these functional modules appear crucial for cognition and behavior (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023). Therefore, we emphasize the need for multi-scale research, as we believe that a variety of approaches will complement each other's weaknesses when taken individually. We clarified this in the introduction:

      “This approach rests on the idea that the canonical computations that underlie cognition and behavior are anchored in population dynamics of interacting functional modules (Safaie et al. 2023; Buzsáki and Vöröslakos, 2023) and bound to spectral fingerprints consisting of network- and frequency-specific coherent oscillations (Siegel et al., 2012).”

      Importantly, we focus on the macro-scale and conclude that, at the anatomical region level, no speech or music selectivity can be observed during natural stimulation. This is stated in the discussion, as follow:

      “In this context, in the current study we did not observe a single anatomical region for which speech-selectivity was present, in any of our analyses. In other words, 10 minutes of instrumental music was enough to activate cortical regions classically labeled as speech (or language) -selective. On the contrary, we report spatially distributed and frequency-specific patterns of shared, preferred, or selective neural responses and connectivity fingerprints. This indicates that domain-selective brain regions should be considered as a set of functionally homogeneous but spatially distributed voxels, instead of anatomical landmarks.”

      References :

      Mendoza-Halliday, D., Major, A.J., Lee, N. et al. A ubiquitous spectrolaminar motif of local field potential power across the primate cortex. Nat Neurosci (2024).

      Safaie, M., Chang, J.C., Park, J. et al. Preserved neural dynamics across animals performing similar behaviour. Nature 623, 765–771 (2023).

      Buzsáki, G., & Vöröslakos, M. (2023). Brain rhythms have come of age. Neuron, 111(7), 922-926.

      While classifying electrodes into 3 categories provides valuable insights, it may not fully capture the complexity of the neural response distribution to speech and music. A more nuanced and continuous approach could reveal subtler gradations in neural response, rather than imposing categorical boundaries. This could be done by computing continuous metrics, like unique variances explained by each category, or ratio-based statistics, etc. Incorporating such a continuum could enhance our understanding of the neural representation of speech and music, providing a more detailed and comprehensive picture of cortical processing.

      To clarify, the metrics we are investigating (coherence, power, linear correlations) are continuous. Additionally, we conduct a comprehensive statistical analysis of these results. The statistical testing, which includes assessing differences from baseline and between the speech and music conditions using a statistical threshold, yields three categories. Of note, ratio-based statistics (a continuous metric) are provided in Figures S9 and S10 (Figures S8 and S9 in the original version of the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      Te Rietmolen et al., investigated the selectivity of cortical responses to speech and music stimuli using neurosurgical stereo EEG in humans. The authors address two basic questions: 1. Are speech and music responses localized in the brain or distributed; 2. Are these responses selective and domain-specific or rather domain-general and shared? To investigate this, the study proposes a nomenclature of shared responses (speech and music responses are not significantly different), domain selective (one domain is significant from baseline and the other is not), domain preferred (both are significant from baseline but one is larger than the other and significantly different from each other). The authors employ this framework using neural responses across the spectrum (rather than focusing on high gamma), providing evidence for a low level of selectivity across spectral signatures. To investigate the nature of the underlying representations they use encoding models to predict neural responses (low and high frequency) given a feature space of the stimulus envelope or peak rate (by time delay) and find stronger encoding for both in the low-frequency neural responses. The top encoding electrodes are used as seeds for a pair-wise connectivity (coherence) in order to repeat the shared/selective/preferred analysis across the spectra, suggesting low selectivity. Spectral power and connectivity are also analyzed on the level of the regional patient population to rule out (and depict) any effects driven by a select few patients. Across analyses the authors consistently show a paucity of domain selective responses and when evident these selective responses were not represented across the entire cortical region. The authors argue that speech and music mostly rely on shared neural resources.

      Strengths:

      I found this manuscript to be rigorous providing compelling and clear evidence of shared neural signatures for speech and music. The use of intracranial recordings provides an important spatial and temporal resolution that lends itself to the power, connectivity, and encoding analyses. The statistics and methods employed are rigorous and reliable, estimated based on permutation approaches, and cross-validation/regularization was employed and reported properly. The analysis of measures across the entire spectra in both power, coherence, and encoding models provides a comprehensive view of responses that no doubt will benefit the community as an invaluable resource. Analysis of the level of patient population (feasible with their high N) per region also supports the generalizability of the conclusions across a relatively large cohort of patients. Last but not least, I believe the framework of selective, preferred, and shared is a welcome lens through which to investigate cortical function.

      Weaknesses:

      I did not find methodological weaknesses in the current version of the manuscript. I do believe that it is important to highlight that the data is limited to passively listening to naturalistic speech and music. The speech and music stimuli are not completely controlled with varying key acoustic features (inherent to the different domains). Overall, I found the differences in stimulus and lack of attentional controls (passive listening) to be minor weaknesses that would not dramatically change the results or conclusions.

      Thank you for this positive review of our work. We added these points as limitations and future directions in the discussion section:

      “Finally, in adopting here a comparative approach of speech and music – the two main auditory domains of human cognition – we only investigated one type of speech and of music also using a passive listening task. Future work is needed to investigate for instance whether different sentences or melodies activate the same selective frequency-specific distributed networks and to what extent these results are related to the passive listening context compared to a more active and natural context (e.g. conversation).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The concepts of activation and deactivation within the study's context of selectivity are not straightforward to comprehend. It would be beneficial for the authors to provide more detailed explanations of how these phenomena relate to the selectivity of neural responses to speech and music. Such elaboration would aid readers in better understanding the nuances of how certain brain regions are selectively activated or deactivated in response to different auditory stimuli.

      The reviewer is right that the reported results are quite complex to interpret. The concepts of activation and deactivation are generally complex to comprehend as they are in part defined by an approach (e.g., method and/or metric) and the scale of observation (Pfurtscheller et al., 1999). The power (or the magnitude) of time-frequency estimate is by definition a positive value. Deactivation (or desynchronization) is therefore related to the comparison used (e.g., baseline, control, condition). This is further complexified by the scale of the measurement, for instance, when it comes to a simple limb movement, some brain areas in sensory motor cortex are going to be activated, yet this phenomenon is accompanied at a finer scale by some desynchonization of the mu-activity, and such desynchronization is a relative measure (e.g., before/after motor movement). At a broader scale it is not rare to see some form of balance between brain networks, some being ‘inhibited’ to let some others be activated like the default mode network versus sensory-motor networks. In our case, when estimating selective responses, it is the strength of the signal that matters. The type of selectivity is then defined by the sign/direction of the comparison/subtraction. We now provide additional details about the sign of selectivity between domains and frequencies in the Methods and Results section:

      Methods:

      “In order to explore the full range of possible selective, preferred, or shared responses, we considered both responses greater and smaller than the baseline. Indeed, as neural populations can synchronize or desynchronize in response to sensory stimulation, we estimated these categories separately for significant activations and significant deactivations compared to baseline.”

      Results:

      “We classified, for each canonical frequency band, each channel into one of the categories mentioned above, i.e. shared, selective, or preferred (Figure 1A), by examining whether speech and/or music differ from baseline and whether they differ from each other. We also considered both activations and deactivations, compared to baseline, as both index a modulation of neural population activity, and have been linked with cognitive processes (Pfurtscheller & Lopes da Silva, 1999; Proix et al., 2022). However, because our aim was not to interpret specific increase or decrease with respect to the baseline, we here simply consider significant deviations from the baseline. In other words, when estimating selectivity, it is the strength of the response that matters, not its direction (activation, deactivation).”

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      References :

      J.P. Lachaux, J. Jung, N. Mainy, J.C. Dreher, O. Bertrand, M. Baciu, L. Minotti, D. Hoffmann, P. Kahane,Silence Is Golden: Transient Neural Deactivation in the Prefrontal Cortex during Attentive Reading, Cerebral Cortex, Volume 18, Issue 2, February 2008, Pages 443–450

      Pfurtscheller, G., & Da Silva, F. L. (1999). Event-related EEG/MEG synchronization and desynchronization: basic principles. Clinical neurophysiology, 110(11), 1842-1857

      (2) The manuscript doesn't easily provide information about the control conditions, yet the conclusion significantly depends on these conditions as a baseline. It would be beneficial if the authors could clarify this information for readers earlier and discuss how their choice of control stimuli influences their conclusions.

      We added information in the Results section about the baseline conditions:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Of note, while the choice of different ‘basic auditory stimuli’ as baseline can change the reported results in regions involved in low-level acoustical analyzes (auditory cortex), it will have no impact on the results observed in higher-level regions, which predominantly also exhibit shared responses. We have now more clearly pointed out this reasoning in the results section:

      “The spatial distribution of the spectrally-resolved responses corresponds to the network typically involved in speech and music perception. This network encompasses both ventral and dorsal auditory pathways, extending well beyond the auditory cortex and, hence, beyond auditory processing that may result from differences in the acoustic properties of our baseline and experimental stimuli.“

      (3) The spectral analyses section doesn't clearly explain how the authors performed multiwise correction. The authors' selectivity categorization appears similar to ANOVAs with posthoc tests, implying the need for certain corrections in the p values or categorization. Could the authors clarify this aspect?

      We apologize that this was not in the original version of the manuscript. In the spectral analyzes, the selectivity categorization depended on both (1) the difference effects between the domains and the baseline, and (2) the difference effect between domains. Channels were marked as selective when there was (1) a significant difference between domains and (2) only one domain significantly differed from the baseline. All difference effects were estimated using the paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the build-in tmax method to correct for the multiple comparisons over channels (Nichols & Holmes, 2002; Groppe et al. 2011). We have now more clearly explained how we controlled family-wise error in the Methods section:

      “For each frequency band and channel, the statistical difference between conditions was estimated with paired sample permutation tests based on the t-statistic from the mne-python library (Gramfort et al., 2014) with 1000 permutations and the tmax method to control the family-wise error rate (Nichols and Holmes 2002; Groppe et al. 2011). In tmax permutation testing, the null distribution is estimated by, for each channel (i.e. each comparison), swapping the condition labels (speech vs music or speech/music vs baseline) between epochs. After each permutation, the most extreme t-scores over channels (tmax) are selected for the null distribution. Finally, the t-scores of the observed data are computed and compared to the simulated tmax distribution, similar as in parametric hypothesis testing. Because with an increased number of comparisons, the chance of obtaining a large tmax (i.e. false discovery) also increases, the test automatically becomes more conservative when making more comparisons, as such correcting for the multiple comparison between channels.”

      References :

      Gramfort, A., Luessi, M., Larson, E., Engemann, D. A., Strohmeier, D., Brodbeck, C., Parkkonen, L., & Hämäläinen, M. S. (2014). MNE software for processing MEG and EEG data. NeuroImage, 86, 446–460.

      Groppe, D. M., Bickel, S., Dykstra, A. R., Wang, X., Mégevand, P., Mercier, M. R., Lado, F. A., Mehta, A. D., & Honey, C. J. (2017). iELVis: An open source MATLAB toolbox for localizing and visualizing human intracranial electrode data. Journal of Neuroscience Methods, 281, 40–48.

      Nichols, T. E., & Holmes, A. P. (2002). Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping, 15(1), 1–25.

      Reviewer #2 (Recommendations For The Authors):

      Other suggestions:

      (1) The authors need to provide more details on how the sEEG electrodes were localized and selected. Are all electrodes included or only the ones located in the gray matter? If all electrodes were used, how to localize and label the ones that are outside of gray matter? In Figures 1C & 1D it seems that a lot of the electrodes were located in depth locations, how were the anatomical labels assigned for these electrodes

      We apologize that this was not clear in the original version of the manuscript. Our electrode localization procedure was based on several steps described in detail in Mercier et al., 2022. Once electrodes were localized in a post-implant CT-scan and the coordinates projected onto the pre-implant MRI, we were able to obtain the necessary information regarding brain tissues and anatomical region. That is, first, the segmentation of the pre-impant MRI with SPM12 provided both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (csf) probabilities) and the indexed-binary representations (i.e., either gray, white, csf, bone, or soft tissues) that allowed us to dismiss electrodes outside of the brain and select those in the gray matter. Second, the individual's brain was co-registered to a template brain, which allowed us to back project atlas parcels onto individual’s brain and assign anatomical labels to each electrode. The result of this procedure allowed us to group channels by anatomical parcels as defined by the Brainnetome atlas (Figure 1D), which informed the analyses presented in section Population Prevalence (Methods, Figures 4, 9-10, S4-5). Because this study relies on stereotactic EEG, and not Electro-Cortico-Graphy, recording sites include both gyri and sulci, while depth structures were not retained.

      We have now updated the “General preprocessing related to electrodes localisation” section in the Methods. The relevant part now states:

      “To precisely localize the channels, a procedure similar to the one used in the iELVis toolbox and in the fieldtrip toolbox was applied (Groppe et al., 2017; Stolk et al., 2018). First, we manually identified the location of each channel centroid on the post-implant CT scan using the Gardel software (Medina Villalon et al., 2018). Second, we performed volumetric segmentation and cortical reconstruction on the pre-implant MRI with the Freesurfer image analysis suite (documented and freely available for download online http://surfer.nmr.mgh.harvard.edu/). This segmentation of the pre-implant MRI with SPM12 provides us with both the tissue probability maps (i.e. gray, white, and cerebrospinal fluid (CSF) probabilities) and the indexed-binary representations (i.e., either gray, white, CSF, bone, or soft tissues). This information allowed us to reject electrodes not located in the brain. Third, the post-implant CT scan was coregistered to the pre-implant MRI via a rigid affine transformation and the pre-implant MRI was registered to MNI152 space, via a linear and a non-linear transformation from SPM12 methods (Penny et al., 2011), through the FieldTrip toolbox (Oostenveld et al., 2011). Fourth, applying the corresponding transformations, we mapped channel locations to the pre-implant MRI brain that was labeled using the volume-based Human Brainnetome Atlas (Fan et al., 2016).”

      Reference:

      Mercier, M. R., Dubarry, A.-S., Tadel, F., Avanzini, P., Axmacher, N., Cellier, D., Vecchio, M. D., Hamilton, L. S., Hermes, D., Kahana, M. J., Knight, R. T., Llorens, A., Megevand, P., Melloni, L., Miller, K. J., Piai, V., Puce, A., Ramsey, N. F., Schwiedrzik, C. M., … Oostenveld, R. (2022). Advances in human intracranial electroencephalography research, guidelines and good practices. NeuroImage, 260, 119438.

      (2) From Figures 5 and 6 (and also S4, S5), is it true that aside from the shared response, lower frequency bands show more music selectivity (blue dots), while higher frequency bands show more speech selectivity (red dots)? I am curious how the authors interpret this.

      The reviewer is right in noticing the asymmetric selective response to music and speech in lower and higher frequency bands. However, while this effect is apparent in the analyzes wherein we inspected stronger synchronization (activation) compared to baseline (Figures 2 and S1), the pattern appears to reverse when examining deactivation compared to baseline (Figures 3 and S2). In other words, there seems to be an overall stronger deactivation for speech in the lower frequency bands and a relatively stronger deactivation for music in the higher frequency bands.

      We now provide additional details about the sign of selectivity between domains and frequencies in the Results section:

      “Both domains displayed a comparable percentage of selective responses across frequency bands (Figure 4, first values of each plot). When considering separately activation (Figure 2) and deactivation (Figure 3) responses, speech and music showed complementary patterns: for low frequencies (<15 Hz) speech selective (and preferred) responses were mostly deactivations and music responses activations compared to baseline, and this pattern reversed for high frequencies (>15 Hz).”

      Note, however, that this pattern of results depends on only a select number of patients, i.e. when ignoring regional selective responses that are driven by as few as 2 to 4 patients, the pattern disappears (Figures 5-6). More precisely, ignoring regions explored by a small number of patients almost completely clears the selective responses for both speech and music. For this reason, we do not feel confident interpreting the possible asymmetry in low vs high frequency bands differently encoding (activation or deactivation) speech and music.

      Minor:

      (1) P9 L234: Why only consider whether these channels were unresponsive to the other domain in the other frequency bands? What about the responsiveness to the target domain?

      We thank the reviewer for their interesting suggestion. The primary objective of the cross-frequency analyzes was to determine whether domain-selective channels for a given frequency band remain unresponsive (i.e. exclusive) to the other domain across frequency bands, or whether the observed selectivity is confined to specific frequency ranges (i.e.frequency-specific). In other words, does a given channel exclusively respond to one domain and never—in whichever frequency band—to the other domain? The idea behind this question is that, for a channel to be selectively involved in the encoding of one domain, it does not necessarily need to be sensitive to all timescales underlying that domain as long as it remains unresponsive to any timescale in the other domain. However, if the channel is sensitive to information that unfolds slowly in one domain and faster in the other domain, then the channel is no longer globally domain selective, but the selectivity is frequency-specific to each domain.

      The proposed analyzes answer a slightly different, albeit also meaningful, question: how many frequencies (or frequency bands) do selective responses span? From the results presented below, the reviewer can appreciate the overall steep decline in selective response beyond the single frequency band with only few channels remaining selectively responsive across maximally four frequency bands. That is, selective responses globally span one frequency band.

      Author response image 1.

      Cross-frequency channel selective responses. The top figure shows the results for the spectral analyzes (baselined against the tones condition, including both activation and deactivation). The bottom figure shows the results for the connectivity analyzes. For each plot, the first (leftmost) value corresponds to the percentage (%) of channels displaying a selective response in a specific frequency band. In the next value, we remove the channels that no longer respond selectively to the target domain for the following frequency band. The black dots at the bottom of the graph indicate which frequency bands were successively included in the analysis.

      (2) P21 L623: "Population prevalence." The subsection title should be in bold.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      The authors chose to use pure tone and syllables as baseline, I wonder if they also tried the rest period between tasks and if they could comment on how it differed and why they chose pure tones, (above and beyond a more active auditory baseline).

      This is an interesting suggestion. The reason for not using the baseline between speech and music listening (or right after) is that it will be strongly influenced by the previous stimulus. Indeed, after listening to the story it is likely that patients keep thinking about the story for a while. Similarly after listening to some music, the music remains in “our head” for some time.

      This is why we did not use rest but other auditory stimulation paradigms. Concerning the choice of pure tones and syllables, these happen to be used for clinical purposes to assess functioning of auditory regions. They also corresponded to a passive listening paradigm, simply with more basic auditory stimuli. We clarified this in the Results section:

      “[...] with respect to two baseline conditions, in which patients passively listened to more basic auditory stimuli: one in which patients passively listened to pure tones (each 30 ms in duration), the other in which patients passively listened to isolated syllables (/ba/ or /pa/, see Methods).”

      Discussion - you might want to address phase information in contrast to power. Your encoding models map onto low-frequency (bandpassed) activity which includes power and phase. However, the high-frequency model includes only power. The model comparison is not completely fair and may drive part of the effects in Figure 7a. I would recommend discussing this, or alternatively ruling out the effect with modeling power separately for the low frequency.

      We thank the reviewer for their recommendation. First, we would like to emphasize that the chosen signal extraction techniques that we used are those most frequently reported in previous papers (e.g. Ding et al., 2012; Di Liberto et al., 2015; Mesgarani and Chang, 2012).

      Low-frequency (LF) phase and high-frequency (HFa) amplitude are also known to track acoustic rhythms in the speech signal in a joint manner (Zion-Golumbic et al., 2013; Ding et al., 2016). This is possibly due to the fact that HFa amplitude and LF phase dynamics have a somewhat similar temporal structure (see Lakatos et al., 2005 ; Canolty and Knight, 2010).

      Still, the reviewer is correct in pointing out the somewhat unfair model comparison and we appreciate the suggestion to rule out a potential confound. We now report in Supplementary Figure S8, a model comparison for LF amplitude vs. HFa amplitude to complement the findings displayed in Figure 7A. Overall, the reviewer can appreciate that using LF amplitude or phase does not change the results: LF (amplitude or phase) always better captures acoustic features than HFa amplitude.

      Author response image 2.

      TRF model comparison of low-frequency (LF) amplitude and high-frequency (HFa) amplitude. Models were investigated to quantify the encoding of the instantaneous envelope and the discrete acoustic onset edges (peakRate) by either the low frequency (LF) amplitude or the high frequency (HFa) amplitude. The ‘peakRate & LF amplitude’ model significantly captures the largest proportion of channels, and is, therefore, considered the winning model. Same conventions as in Figure 7A.

      References:

      Canolty, R. T., & Knight, R. T. (2010). The functional role of cross-frequency coupling. Trends in Cognitive Sciences, 14(11), 506–515.

      Di Liberto, G. M., O’sullivan, J. A., & Lalor, E. C. (2015). Low-frequency cortical entrainment to speech reflects phoneme-level processing. Current Biology, 25(19), 2457-2465.

      Ding, N., & Simon, J. Z. (2012). Emergence of neural encoding of auditory objects while listening to competing speakers. Proceedings of the National Academy of Sciences, 109(29), 11854-11859.

      Ding, N., Melloni, L., Zhang, H., Tian, X., & Poeppel, D. (2016). Cortical tracking of hierarchical linguistic structures in connected speech. Nature Neuroscience, 19(1), 158–164.

      Golumbic, E. M. Z., Ding, N., Bickel, S., Lakatos, P., Schevon, C. A., McKhann, G. M., ... & Schroeder, C. E. (2013). Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party”. Neuron, 77(5), 980-991.

      Lakatos, P., Shah, A. S., Knuth, K. H., Ulbert, I., Karmos, G., & Schroeder, C. E. (2005). An oscillatory hierarchy controlling neuronal excitability and stimulus processing in the auditory cortex. Journal of Neurophysiology, 94(3), 1904–1911.

      Mesgarani, N., & Chang, E. F. (2012). Selective cortical representation of attended speaker in multi-talker speech perception. Nature, 485(7397), 233-236.

      Similarly, the Coherence analysis is affected by both power and phase and is not dissociated. i.e. if the authors wished they could repeat the coherence analysis with phase coherence (normalizing by the amplitude). Alternatively, this issue could be addressed in the discussion above

      We agree with the Reviewer. We have now better clarified our choice in the Methods section:

      “Our rationale to use coherence as functional connectivity metric was three fold. First, coherence analysis considers both magnitude and phase information. While the absence of dissociation can be criticized, signals with higher amplitude and/or SNR lead to better time-frequency estimates (which is not the case with a metric that would focus on phase only and therefore would be more likely to include estimates of various SNR). Second, we choose a metric that allows direct comparison between frequencies. As, at high frequencies phase angle changes more quickly, phase alignment/synchronization is less likely in comparison with lower frequencies. Third, we intend to align to previous work which, for the most part, used the measure of coherence most likely for the reasons explained above.“

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between the hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      We thank the reviewer for highlighting our contributions and suggesting that the relationship between visual imagery and autobiographical memory recall is an exciting future avenue.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' - a question which is ungrammatical, but potentially reflects a typo in the manuscript) could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight into their spatial navigation abilities (they could have been overconfident or underconfident in their abilities).

      Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls in using their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

      The main goal of our study was to examine autobiographical memory recall. Therefore, we used the gold standard Autobiographical Interview, or AI (Levine et al. 2002) and an fMRI paradigm to explore autobiographical memory recall as standardised, precisely, and objectively as possible.

      In addition to these experimentally rigorous tasks, we employed some loosely formulated questions with the intention for people to reflect on how they perceive their own abilities to recall autobiographical memories, navigate spatially, and use their imagination. We agree with the reviewer that these questions are vague and did not have the experimental standard for an investigation into spatial cognition or imagination associated with aphantasia. Nonetheless, we believe that these questions provide important additional insights into what participants think about their own cognitive abilities. In order to set these questions into perspective, we argue in the discussion that spatial cognition and other cognitive functions should be investigated in more depth in individuals with aphantasia in the future.

      As an additional note, all tasks were conducted in German. Thus, we were able to correct the wording of the debriefing question in our revision. We thank the reviewer for bringing this to our attention.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      We much appreciate the acknowledgment of our work into autobiographical memory employing both the autobiographical interview and fMRI. Furthermore, we hope that our work inspires future research in the way the reviewer outlines and in the way we describe in our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in activity in the visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between the visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      We thank the reviewer for highlighting the importance of our findings.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Once again, we thank the reviewer for highlighting the quality of our methods and our results.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation.

      We agree with the reviewer that our control task differs from autobiographical memory in many different ways. In fact, for this first investigation of the neural correlates of autobiographical memory in aphantasia, this is precisely the reason why we chose this mental arithmetic (MA) task. We know from previous studies, that MA is, as much as possible, not dependent on hippocampal memory processes (Addis, et al. 2007, McCormick et al. 2015, 2017, Leelaarporn et al., 2024). The main goal of the current study was to establish whether there are any differences between individuals with aphantasia and controls. In the next investigation, we can now build on these findings to disentangle in more detail what this difference reflects. 

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

      This highly positive conclusion is much appreciated.

      References

      Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia45(7), 1363-1377.

      Kriegeskorte, N., Simmons, W., Bellgowan, P. et al. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci 12, 535–540 (2009). https://doi.org/10.1038/nn.2303

      Leelaarporn, P., Dalton, M. A., Stirnberg, R., Stöcker, T., Spottke, A., Schneider, A., & McCormick, C. (2024). Hippocampal subfields and their neocortical interactions during autobiographical memory. Imaging Neuroscience.

      Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychology and aging17(4), 677.

      McCormick, C., St-Laurent, M., Ty, A., Valiante, T. A., & McAndrews, M. P. (2015). Functional and effective hippocampal–neocortical connectivity during construction and elaboration of autobiographical memory retrieval. Cerebral cortex25(5), 1297-1305.

      McCormick, C., Moscovitch, M., Valiante, T. A., Cohn, M., & McAndrews, M. P. (2018). Different neural routes to autobiographical memory recall in healthy people and individuals with left medial temporal lobe epilepsy. Neuropsychologia110, 26-36.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting article that makes a substantial contribution to the field of the study of aphantasia as well as the neural mechanisms of autobiographical memory. I would strongly recommend this manuscript to be accepted (with these minor revisions), as it makes a substantial and well-evidenced contribution to the research, and it opens up many interesting avenues for researchers to explore. I was especially excited to see that the Autobiographical Interview had been paired with an fMRI paradigm, something which this field of research highly benefits from, as there are yet so few fMRI studies into aphantasia. I understand that it is the authors' decision whether to accept or reject any of the revisions I recommend here, but I would like to stress that I encourage accepting the recommended revisions, especially as there are some minor inaccuracies in the manuscript as it currently stands. Finally, I would like to stress that though I am based in the area of cognitive science, am not trained in fMRI imaging techniques, and therefore do not stand in a position where I can comment on the methodology pertaining to this part of the study - I encourage the Editors to seek a second reviewer's opinion on this.

      Thank you for the positive evaluation of our manuscript as well as your comments. We have revised our manuscript according to your important suggestions as further explained below.

      Line 33: "aphantasia prohibits people from experiencing visual imagery". This  characterisation of aphantasia is too strong, especially as the authors use 32 as a cut-off point on the VVIQ, which represents weak and dim imagery. I would recommend using language like 'people with aphantasia have reduced visual imagery abilities', as this more accurately captures the group of people studied. Please revise throughout the manuscript. Please consult Blomkvist and Marks (2023) on this point who have discussed this problem in the aphantasia literature.

      We agree that aphantasics may experience reduced visual imagery abilities. We have revised our wording throughout the manuscript.

      Line 49: The authors conclude that their results 'indicate that visual mental imagery is essential for detail-rich, vivid AM', but this seems to be a bit too strong, for example since AM can be detail-rich with external (rather than internal) detail, and a person could potentially use mnemonic tricks such as keeping a detail-rich diary in order to boost their memory. That visual imagery is 'essential' implies that it is the only way to achieve detail-rich vivid AM, and this does not seem to be supported by the findings. I would recommend rephrasing it as 'visual mental imagery plays an important role in detail-rich, vivid AM' or 'visual mental imagery mediated detail-rich vivid AM'.

      We altered the sentence in Line 49 using one of the recommended phrases:

      ‘Our results indicate that visual mental imagery plays an important role in detail-rich, vivid AM, and that this type of cognitive function is supported by the functional connection between the hippocampus and the visual-perceptual cortex.’

      Line 69: Blomkvist and Marks (2023) have warned against calling aphantasia a 'condition' and this moreover seems to fit with the authors' previous research (Monzel, 2022). Please consider instead calling aphantasia an 'individual difference' in mental imagery abilities.

      Thank you for the suggestion. We have revised our wording throughout the manuscript, avoiding the term ‘condition’.

      Line 72: Add reference for emotional strength which has also been researched (Wicken et al. 2021, https://doi.org/10.1016/j.cortex.2020.11.014).

      We have added the suggested reference in Line 75:

      ‘Indeed, a handful of previous studies report convergent evidence that aphantasics report less sensory AM details than controls (Bainbridge et al., 2021; Dawes et al., 2020, 2022; Milton et al., 2020; Zeman et al., 2020), which may also be less emotional (Monzel et al., 2023; Wicken et al., 2021).’

      72-73: 'absence of voluntary imagery' - too strong as many people with aphantasia report having weak/dim mental imagery on the VVIQ.

      We agree that aphantasics may experience reduced visual imagery. We have revised this notion throughout the manuscript.

      74: Add reference to Bainbridge study which found a difference between recall of object vs spatial memory. This would be relevant here.

      We have added the suggested reference in Line 76:

      ‘Spatial accuracy, on the other hand, was not found to be impaired (Bainbridge et al., 2021).’

      Lines 94-97: The authors mention 'a prominent theory' but it is unclear which theory is referred to here. The article cited by Pearson (2019) does not suggest the possibility that aphantasia is due to altered connectivity between the hippocampus and visual-perceptual cortices. It suggests that aphantasia is due to impairment in the ventral stream, and in fact says that the hippocampus is unlikely to be affected due to spared spatial abilities in people with aphantasia. Specifically, Pearson claims: "Accordingly, memory areas of the brain that process spatial properties, including the hippocampus, may not be the underlying cause of aphantasia." (page 631). The authors further come back to this point in the discussion section (see comment below), saying that the hypothesis attributed to Pearson is supported by their study. I do not disagree with the point that the hypothesis is supported by the data, but it is unclear to me why the hypothesis is attributed to Pearson.

      Thank you for pointing out this inaccuracy. We have edited the text to spell out our entire train of thought (see Lines 96-102):

      ‘A prominent theory posits that because of this hyperactivity, small signals elicited during the construction of mental imagery may not be detected (Pearson, 2019, Keogh et al., 2020). Pearson further speculates that since spatial abilities seem to be spared, the hippocampus may not be the underlying cause of aphantasia. In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Line 97: Blomkvist reference should be 2022 (when first published online).

      The article ‘Aphantasia: In search of a theory’ by Blomkvist was first published on 1st July 2022. However, a correction was added on 13th March 2023. Therefore, we had cited the corrected version in this manuscript. However, we agree that the first publication date should be used and edited the reference accordingly.

      Line 116: 'one aphantasic' could be seen as offensive. I would suggest 'one aphantasic participant'.

      We have altered the paragraph according to your suggestion.

      Line 138: In line with the recommendations put forward by Blomkvist and Marks (2023), I would suggest removing the word 'diagnosed', as this medicalises aphantasia in a way that is not consistent with its not being a kind of mental disorder (Monzel et al., 2022). I would say that aphantasia is instead operationalised as a score between 16-32. However, note that Blomkvist (2022) and Blomkvist and Marks (2023, https://doi.org/10.1016/j.cortex.2023.09.004 ) point out that there is also a lot of inconsistency in this score and how it is used in different studies. In your manuscript, I would recommend removing all wording that indicates that people with aphantasia have no experience of mental imagery, as you have operationalised for a score up to 32 which indicates vague and dim imagery. Describing vague and dim imagery as no imagery/absence of imagery is inconsistent (but common practice in the literature).

      Thank you for your suggestion. We have revised the entire manuscript to eliminate any ambiguous meanings regarding the definition of aphantasia. Moreover, we replaced the word ‘diagnosed’ with ‘identified’ in Line 146.

      Line 153: maybe 'correlated with imagery strength' rather than 'measures imagery strength'?

      We have altered the sentence according to your suggestion in Line 160:

      ‘Previous studies have shown that the binocular rivalry task validly correlated with mental imagery strength.’

      Line 162: "For participants who were younger than 34 years, the middle-age memory was replaced by another early adulthood memory". Is there precedence for this? Please add one sentence to explain/justify for the reader why a memory from this time period was chosen.

      To maintain the homogeneous data set of acquiring five episodic autobiographical memories from five different periods of life per one individual, we asked the participants who were at the time of the interview, younger than 34 years old, to provide another early adulthood memory instead of middle age memory, as they had not reached the age range of middle age. According to Levine et al. (2002), younger adults (age < 34 years old) selected 2 events from the early adulthood period. Hence, all participants provided the last time period with memories from their previous year. We have added an additional explanation in this section in Line 170:

      ‘In order to acquire five AMs in every participant, the middle age memory was replaced by another early adulthood memory for participants who were younger than 34 years old (see Levine et al., 2002). Hence, all participants provided the last time period with memories from their previous year.’

      Line 169: "During the general probe, the interviewer asked the participant encouragingly to promote any additional details." Consider a different word choice, 'promote' sounds odd.

      We have altered the sentence according to your suggestion in Line 180:

      ‘During the general probe, the interviewer asked the participant encouragingly to provide any additional details.’

      Line 196-198: the phrasing of these questions could have biased participants toward reporting it being more difficult. Did the authors control for this possibility in any way? The phrasing ‘How easy is it for you to [x]?’ might also be considered in a future study.

      Thank you for pointing this out. These debriefing questions were thought of as open questions to get people to talk about their experiences. They were not meant as rigorous scientific experiments. Framing it in a positive way is a good idea for future research.

      We have edited the manuscript on Line 394-396:

      ‘The debriefing questions were employed as a way for participants to reflect on their own cognitive abilities. Of note, these were not meant to represent or replace necessary future experiments.’

      Line 197: This question is ungrammatical. Is this a typo, or was this how the question was actually posed? What language was the study conducted in?

      All interviews within this study were conducted in German. Hence, the questions listed in this current manuscript were all translated from German into English. We have added this information in the Materials and Methods section in Line 169 as well as restructured the referred questions from Line 208-210:

      ‘All interviews were conducted in German.’

      (1) Typically, how difficult is it for you to recall autobiographical memories?

      (2) Typically, how difficult is it for you to orient yourself spatially? 

      (3) Typically, how difficult is it for you to use your imagination?’

      Line 211: The authors write that participants were asked to "re-experience the chosen AM and elaborate as many details as possible in their mind's eye" was this the instruction used? I think stating the explicit instruction here would be relevant for the reader. If this is the word choice, it is also interesting as the autobiographical interview does not normally specify to re-experience details 'in one's mind's eye'.

      The instructions gi‘en to ’he par’Icipa’ts were to choose an AM and re-experience/elaborate it in their mind with as many details as possible without explaining them out loud. We have clarified this in Lines 221-223.

      ‘For the rest of the trial duration, participants were asked to re-experience the chosen AM and try to recall as many details as possible without speaking out loud.’

      Line 213: Were ‘vivid’ and ‘faint’ the only two options? Why was a 5-point scale (like the VVIQ scale) not used to better be able to compare?

      During the scanning session, the participants were given a button box which contained two buttons with 'vivid' by pressing the index finger and 'faint' by pressing the middle finger. The 5-point scale was not used to avoid confusion with the buttons during the scanning session. We have clarified this in Line 224:

      ‘We chose a simple two-button response in order to keep the task as easy as possible.’

      Line 347: Do the authors mean the same thing by 'imagery strength' and 'imagery vividness'? This would be good to clarify as it is not clear that these words mean the same thing.

      Imagery strength is often used to describe the results of the Binocular Rivalry Task, whereas vividness of mental imagery is often used to describe the results of the VVIQ. Although both tasks are correlated, the VVIQ measures vividness, whereas the dimension of the Binocular Rivalry Task is not clearly defined. We added this information in a footnote on page 10.

      Lines 353 - 356: When the authors first say that aphantasics described fewer memory details than controls, does this refer to external + internal details? Please clarify.

      Lines 353-360: The authors first say that aphantasics report "internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94)" (line 355). But then they say: "a 2-way interaction was found between the type of memory details and group, F(1, 27)= 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b)" (line 358). This seems to first say that aphantasics didn't report fewer details than controls, but then that they did report fewer internal details than controls. Please clarify if this is correct.

      Line 383: Results from controls are not reported in this section.

      We have first reported the main effects of the different factors; thus, aphantasics reported less details than controls (no matter of group and type of memory details), the internal details were reported more often than external details (no matter of group and memory period), and more details were reported for recent than remote memories (no matter of group and type of memory details). Subsequently, we report the simple effects for aphantasics and controls separately. To further clarify, we added the following segment in line 360:

      ‘Regarding the AI, we found significant main effects of memory period, F(1, 27) = 11.88, p = .002, ηp2 = .31, type of memory details, F(1, 27) = 189.03, p < .001, ηp2 = .88, and group, F(1, 27) = 9.98, p = .004, ηp2 = .27. When the other conditions were collapsed, aphantasics (M = 26.29, SD = 9.58) described less memory details than controls (M = 38.36, SD = 10.99). For aphantasics and controls combined, more details were reported for recent (M = 35.17, SD = 14.19) than remote memories (M = 29.06, SD = 11.12), and internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94). More importantly, a 2-way interaction was found between type of memory details and group, F(1, 27) = 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b).’

      Overall, the results were reported for aphantasics and controls separately in Lines 368-372.

      Line 386: The question does not specify that it's asking about using imagination in daily life, even though this is what results report. I'm not sure that the question implies the use of imagination in daily life, so I would recommend removing this reference here.

      We have removed the “in daily life” since this was not part of the original debriefing question.

      Line 394: Could this slowness in response reflect uncertainty about the vividness?

      Since the reason for this slowness is not known, we have refrained from adding this to the discussion. However, we added this as a short insertion in line 406:

      ‘Moreover, aphantasics responded slower (M = 1.34 s, SD = 0.38 s) than controls (M = 1.00 s, SD = 0.29 s) when they were asked whether their retrieved memories were vivid or faint, t(28) = 2.78, p = .009, possibly reflecting uncertainty in their response.’

      Line 443: Graph E, significance not indicated on the graph.

      After preprocessing, the fMRI data were statistically analyzed using the GLM contrast AM versus MA. The resulting images were then thresholded at p < 0.001, so that the illuminated voxels in Fig. 3 A, B, C, and D show only voxel in which we know already that there is a statistical difference between our conditions. Graph E illustrates only the descriptive means and variance of the significant differences in Fig. 3 C and D. This display is useful since the reader can more easily assess the difference between two conditions and two groups at a glance. For a general discussion on this topic, please also see circular analysis in fMRI (Kriegeskorte et al. 2009)

      Line 521-522: The authors claim that Pearson (2019) forwards the hypothesis that heightened activity of visual-perceptual cortices hinders aphantasics from detecting small imagery-related signals. However, I find no statement of this hypothesis in Pearson (2019). It is unclear to me why this hypothesis is attributed to Pearson (2019). Please remove this reference or provide a correct citation for where the hypothesis is stated. Further, it is not clear from what is written how the results support this hypothesis as this is rather brief - please elaborate on this.

      We attributed this hypothesis to Pearson (2019) according to his Fig. 4, which states: ‘A strong top-down signal and low noise (bottom left) gives the strongest mental image (square), whereas a high level of neural noise and a weak top-down imagery signal would produce the weakest imagery experience (top right).’

      We have edited our manuscript to reflect Pearson better in Lines 543-550:

      ‘In a prominent review, Pearson synthesizes evidence about the neural mechanism of imagery strength (Pearson, 2019). Indeed, activity metrics in the visual cortex predict imagery strength (Cui et al., 2007; Dijkstra et al., 2017). Interestingly, lower resting activity and excitability result in stronger imagery, and reducing cortical activity in the visual cortex via transcranial direct current stimulation (tDCS) increases visual imagery strength (Keogh et al., 2020). Thus, one potential mechanism of aphantasia-related AM deficits is that the heightened activity of the visual-perceptual cortices observed in our and previous work hinders aphantasics to detect weaker imagery-related signals.’

      Line 575: Consider citing Blomkvist (2022) who has argued that aphantasia is an episodic memory condition

      We added the suggested reference in Line 601.

      Line 585: Consider citing Bainbridge et al (2021) https://doi.org/10.1016/j.cortex.2020.11.014

      We have added the suggested reference in Line 612.

      Line 581: It might be relevant here to also discuss non-visual details, which have indeed been investigated in your present study. E.g. the lower emotional details, temporal details, place details, etc.

      We have edited our discussion to reflect the non-visual details better in Line 605:

      ‘In fact, previous and the current study show that aphantasics and individuals with hippocampal damage report less internal details across several memory detail subcategories, such as emotional details and temporal details (Rosenbaum et al., 2008; St-Laurent et al., 2009; Steinvorth et al., 2005), and these deficits can be observed regardless of the recency of the memory (Miller et al., 2020). These similarities suggest that aphantasics are not merely missing the visual-perceptual details to specific AM, but they have a profound deficit associated with the retrieval of AM.’

      Place details are discussed on page 37 onwards.

      Line 605: I agree with this interesting suggestion for future research. It would also be relevant to reference Bainbridge (2021) here who tested spatial cognition in a drawing task and found that aphantasic participants correctly recalled spatial layouts of rooms but reported fewer objects than controls. It might also be worth pointing out that the present study does not actually test for accuracy in spatial cognition, so it could be the case that people with aphantasia feel confident that they can navigate well, but they might in fact not. Future studies relying on objective measures should test this possibility.

      We have added the suggested reference in Line 625.

      Lines 609-614: Is there any evidence that complex decision-making and complex empathy tasks depend on constructed scenes with visual-perceptual details? This hypothesis seems a bit far-fetched without any supporting evidence. In fact, it seems unlikely to be supported as we also know that people with aphantasia generally live normal lives, and often have careers that we can assume involve complex decision-making (see Zeman 2020 who report aphantasics who work as computer scientists, managers, etc). I would recommend that the authors provide evidence of the role of mental imagery in complex decision-making and complex empathy tasks, mediated by scene construction, to support this hypothesis as viable to test for future research. It is also unclear how this point connects to the argument made by Bergmann and Ortiz-Tudela (2023). In fact, Bergmann and Ortiz-Tudela seem to make the same argument as Pearson (2019) does - that aphantasia results from impairments in the ventral stream, but that the dorsal stream is unaffected. However, Blomkvist (2022) argues that this view is too simplistic to be able to account for the variety of deficits that we see in aphantasia. I would recommend either engaging more fully with this debate or cutting it, as it currently is too vague for a reader to follow.

      We have decided to leave the discussion about scene construction and its connection to complex decision making and empathy out of the current manuscript. We have included the argument of Bergmann & Ortiz-Tudela (2023) in the Introduction (Line 101):

      ‘In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Reviewer #2 (Recommendations For The Authors):

      In general, I really enjoyed reading this paper.

      Thank you very much for the positive evaluation of our manuscript as well as your comments.

      There were only a few things that I had some concerns about. For example, it was unclear to me whether the whole-brain analysis (Figures 3 and 4) was corrected for multiple comparisons or why only a small volume correction was applied for the functional connectivity analysis. If these results are borderline significant, this should be made more explicit in the manuscript. I don't think this is a major issue as the investigation of both the hippocampus and visual cortex was strongly hypothesis-driven, but it would still be good to be explicit about the strength of the findings.

      For the whole-brain analysis, we applied a threshold of p < .001, voxel cluster of 10, but no other multiple comparisons correction applied. The peak in the right hippocampus did survive the whole-brain threshold but we decided to lower this threshold just for display purposes in Figure 3, so that the readers can easily see the cluster.

      We have made the statistical thresholds more easily assessable for the reader on the following pages:

      Figure 3 (Page 27): ‘Images are thresholded at p < .001, cluster size 10, uncorrected, except (D) which is thresholded at p < .01, cluster size 10, for display purposes only (i.e., the peak voxel and adjacent 10 voxels also survived p < .001, uncorrected).’

      Figure 4 (Page 30): ‘Image is displayed at p < .05, small volume corrected, and a voxel cluster threshold of 10 adjacent voxels.’

      I was wondering whether it would be possible to use DCM to investigate the directionality of the connectivity. Given that there are only two ROIs and two alternative hypotheses (top-down versus bottom-up) this seems like an ideal DCM problem.

      We thank the reviewer for this suggestion and will consider testing the effective connectivity between both regions of interest in a future investigation. 

      Line 385: typo: 'great' should be 'greater'.

      We have altered the typo from ‘great’ to ‘greater’ in Line 397.

      Line 400: absence of evidence of an effect is not evidence of absence of an effect.

      We agree with the reviewer that this was unclear. We changed the wording in Line 412:

      ‘In addition, aphantasics and controls did not differ significantly in their time searching for a memory in AM trials, t(19) = 1.03, p = .315.’

      Typo line 623: 'overseas'.

      We have altered the mistyped word from ‘overseas’ to ‘oversees’ in Line 647.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the Authors:

      Reviewer 1:

      (1) Figure legends are too sparing, and often fail to describe with enough detail and accuracy the experiments presented. Especially in a work like this one, which uses plenty of different approaches and techniques and has a concise main text, description in the figure legends can really help the reader to understand the technical aspects of the experimental design. In my opinion, this will also help highlight the effort the authors put into exploring different and often new technical approaches. 

      We thank Reviewer 1 for highlighting this point and agree with them that the original figure legends lacked detailed information. In this revised version of our paper we edited all figure legends providing higher detail on experiments and information displayed (see Main text p12-16, Supplementary Information p2-5). We hope this change will improve the clarity and accuracy of the description of our experiments. 

      Reviewer 2:

      (1) Is there evidence that the early movement phenotype is actually linked to the larval movement phenotype? I noticed that the chordotonal driver experiment was only examined for larval movement. Is this driver not expressed earlier? Could the authors check the early phenotype using this driver? Are there early drivers that are expressed in chordotonal organ precursors (not panneuronal) and does the knockdown of CG3638 in these specific cells suppress the early phenotype?

      (2) More broadly, I would like to understand the function of the early embryonic movements. My concern is that they may only be a sign that the nervous system is firing up. If the rescue of the late miRNA mutant phenotype with chordotonal organ expression is only through a late change in the expression of CG3638, then the larval phenotype is probably not due to a developmental change, but a change in the immediate functioning of the neurons. Would this suggest that the early pulsing is not required for anything, at least at our level of understanding? If the driver is actually expressed early and late, then perhaps the authors could test later drivers to delimit the early and late functions of the miRNA? 

      The comments by Reviewer 2 in the points above are important and enquire about the biological role of early embryonic movements and whether these movements are linked to later larval activity or are somewhat irrelevant to the behaviour of the animal at later stages. 

      To address this important question, we conducted a new experiment in which we reduced neural activity specifically in the embryo (i.e. from 10hs AEL until the end of embryogenesis) and tested whether this treatment had any impact on larval movement. If – as put by Rev2 – the ‘early pulsing is not required for anything’ and the larval phenotype emerges from an acute change in neuronal physiology, then our experiment should show no effects at the larval stage. The results shown in Figure S4 (see Supplementary Information, p5) show that this is not the case: artificial reduction of neural activity during embryogenesis leads to a statistically significant reduction in larval speed, similar to that caused by the loss of miR-2b-1. This shows that modifications of embryonic activity impact larval movement. 

      Furthermore, earlier work on the biological role of embryonic activity identified an activity-dependent ‘critical period’ during late embryogenesis (Giachello and Baines, 2015; Ackerman et al., 2021): manipulations at or around this critical period result in both locomotor and seizure phenotypes in larvae. We cite these papers in the main text (p7).

      In addition, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity specifically during the embryonic period prevents the generation of normal neural activity patterns in both, embryo and larva. Similar results are observed when proprioceptive sensory inputs to the central nervous system are blocked, with larval locomotion also disrupted. 

      Altogether, the data already in the literature plus our new addition to the paper, show that early embryonic movements play a key role in the development of the nervous system and larval locomotion.

      (3) Given the role in the larval chordotonal organs, have the authors also checked the adult movements? 

      The question of whether miR-2b-1 action in chordotonal organs affects behaviour at later stages of the Drosophila life cycle is interesting and was the reason why we assessed different genetic manipulations at the larval stage. However, we believe that assessing adult locomotor phenotypes is beyond the scope of this paper. 

      (4) The authors state that mir-2b-1 is a mirtron. I do not believe this is correct. It is not present in an intron in Btk from what I can see. Also, in the reference that the authors use when stating that mir-2b is a mirtron, I believe mir-2b-1 is actually used as a non-mirtron control miRNA. As mirtrons are processed slightly differently from regular hairpins and often use only the 3' end of the hairpin for miRNA creation, this may not be a trivial distinction. 

      We are grateful to Rev2 for highlighting this point: indeed, as they say, miR-2b-1 is located in the 3’UTR of host gene Btk, rather than in an intron. Accordingly, in this revision we remove the comment on miR-2b-1 being a mirtron (p6) and deleted the citation accordingly. 

      (5) For miRNA detection, the authors use in situ hybridization and QPCR. Both methods show that the gene is expressed but not that the mature miRNA is made. If the authors wanted a truly independent test for the presence of the miRNA, a miRNA sensor might be a better choice and it would hint at which part of the hairpin makes the functional miRNA. This is probably not necessary but could be a nice addition. 

      We thank Rev2 for drawing attention to this point and allowing this clarification. The qPCR protocol we used is based on the method developed by Balcells et al., 2011 (w/303 citations) (see Materials and Methods section in Supplementary Information, p14) which allows the specific amplification of mature miRNA transcripts, and not their precursors. This method for mature miRNA PCR is so robust that it has even been patented (WO2010085966A2). To ensure that the reader is clear about our methods, we state in the main text (p6) that we perform "RT-PCR for the mature miRNA transcript".  [NB: miRNA sensors provide a useful method to assess miRNA expression but can also act as competitive inhibitors of physiological miRNA functions, titrating away miRNA molecules from their real targets in tissue; therefore, results using this method are often difficult to interpret.]

      (6) Curious about mir-2b-1 and any overlap with the related mir2b-2 and the mir2a genes. I am just wondering about the similarity in their sequences/targets and if they might have similar phenotypes or enhance the phenotypes being scored by the authors. 

      This is an interesting point raised by REV2 and indeed miR-2b-1 does belong to the largest family of microRNAs in Drosophila, the miR-2 family, discussed in detail by Marco et al., 2012. However, we consider that performing tests of additional miRNA mutations, both individually and in combination with miR-2b-1, is beyond the scope of this paper.

      (7) Related to this, the authors show that the reduction of a single miRNA target suppresses the miRNA loss of function phenotype. This indicates that this target is quite important for this miRNA. I wonder if the target site is conserved in the human gene that the authors highlight.

      This is another interesting comment by Rev2. To pursue their idea, we have performed a blast for the miR-2b-1 target site in the human orthologs of CG3638 and did not find a match suggesting that the relationship between miR-2b-1 and CG3638 is not evolutionarily preserved between insects and mammals. 

      Public Reviews:

      Reviewer #1:

      Weaknesses: 

      The authors do not describe properly how the miRNA screening was performed and just claim that only miR-2b-1 mutants presented a defective motion phenotype in early L1. How many miRNAs were tested, and how candidates were selected is never explicitly mentioned in the text or the Methods section.

      We identified miR-2b-1 as part of a genetic screen aimed at detecting miRNAs with impact on embryonic movement, but this full screen is not yet complete. Seeing the clear phenotype of miR2b-1 in the embryo prompted us to study this miRNA in detail, which is what we report in this paper. 

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup. Although similar changes to those described for the neurogenic phase of embryonic movement are described for the myogenic phase in miR-2b-1 mutants (reduction in motion amplitude), this phenotype goes unexplored. This is not a big issue, as the authors convincingly demonstrate later that miR-2b-1 is specifically required in the nervous system for proper embryonic and larval movement, and the effects of miR-2b-1 on myogenic movement might as well be the focus of future work. However, it will be interesting to discuss here the implications of a reduced myogenic movement phase, especially as miR-2b-1 is specifically involved in regulating the activity of the chordotonal system - which precisely detects early myogenic movements. 

      We thank Rev1 for their interest in that loss of miR-2b-1 results in a decrease in movement during the myogenic phase, in addition to the neurogenic phase. Indeed, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity during a period that overlaps with the myogenic phase prevents the formation of normal neural activity patterns and larval locomotion. They also observe the same when inhibiting proprioceptive sensory inputs to the central nervous system. This could suggest that the effects of miR-2b-1 on the myogenic phase might have ‘knock-on’ effects upon the later neurogenic phase and larval movement. However, we note that genetic restoration of miR-2b-1 expression specifically to neurons completely rescues the larval speed phenotype (Fig. 3G), suggesting that the dominant effect of miR-2b-1 upon movements is through its action within neurons. To recognise Rev1’s comment we have added a short sentence to the text (p7) suggesting that ‘the effects of miR-2b-1 observed at earlier stages (myogenic phase) are possibly offset by normal neural expression of miR-2b-1’.  

      FACS-sorting of neuronal cells followed by RT-PCR convincingly detects the presence of miR-2b-1 in the embryonic CNS. However, control of non-neuronal cells would be required to explore whether miR-2b-1 is not only present but enriched in the nervous system compared to other tissues. This is also the case in the miR-2b-1 and Janus expression analysis in the chordotonal organs: a control sample from the motor neurons would help discriminate whether miR-2b-1/Janus regulatory axis is specifically enriched in chordotonal organs or whether both genes are expressed throughout the CNS but operate under a different regulation or requirements for the movement phenotypes.

      The RNA in situ hybridisation data included in the paper (Fig. 3B) show that RNA probes for miR2b-1 precursors reveal very strong signal in neural tissue – with very low signal detected in other tissues – strongly indicating that expression of miR-2b-1 is highly enriched in the nervous system.

      Reviewer #2:

      Weaknesses: 

      As I mentioned above, I felt the presentation was a bit overstated. The authors present their data in a way that focuses on movement, the emergence of movement, and how their miRNA of interest is at the center of this topic. I only point to the title and name that they wish to give the target of their miRNA to emphasize this point. "Janus" the GOD of movement and change. The results and discussion section starts with a paragraph saying, "Movement is the main output of the nervous system... how developing embryos manage to organise the necessary molecular, cellular, and physiological processes to initiate patterned movement is still unknown. Although it is clear that the genetic system plays a role, how genes control the formation, maturation and function of the cellular networks underlying the emergence of motor control remains poorly understood." While there is nothing inherently untrue about these statements, it is a question of levels of understanding. One can always argue that something in biology is still unknown at a certain level. However, one could also argue that much is known about the molecular nature of movement. Next, I am not sure how much this work impacts the area of study regarding the emergence of movement. The authors show that a reduction of a miRNA can affect something about certain neurons, that affects movement. The early movements, although slightly diminished, still emerge. Thus, their work only suggests that the function of some neurons, or perhaps the development of these neurons may impact the early movements. This is not new as it was known already from early work from the Bate lab.  Later larval movements were also shown to be modified in the miRNA mutants and were traced to "janus" overexpression in the chordotonal organs. As neurons are quite sensitive to the levels of Cl- and Janus is thought to be a Cl- channel, this could lead to a slight dysfunction of the chordotonal neurons. So, based on this, the work suggests that dysfunction of the chordotonal organs could impact larval movement. This was, of course, already known. The novelty of this work is in the genes being studied (important or not). We now know that miR 2b-1 and Janus are expressed in the early neurons and larval chordotonal neurons and their removal is consistent with a role for these genes in the functioning of these neurons. This is not to trivialize these findings, simply to state that these results are not significantly changing our overall understanding of movement and the emergence of movement. I would call it a stretch to say that this miRNA CONTROLS the emergence of movement, as in the title. 

      As already mentioned in our provisional response, on this point we politely – but strongly – disagree with Rev2’s suggestion that the findings are inflated by our language. We also note that they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. 

      Finally, the name Janus should be changed as it is already being used. A quick scan of flybase shows that there is a Janus A and B in flies (phosphatases) and I am surprised the authors did not check this. I was initially worried about the Janus kinase (JAK) when I performed the search. While I understand that none are only called Janus, studies of the jan A and B genes refer to the locus as the janus region, which could lead to confusion. The completely different molecular functions of the genes relative to CG3638 add to the confusion. Thus, I ask that the authors change the name of CG3638 to something else.

      Thank you for spotting this omission. In the revised MS we propose a new name – Movement Modulator (Motor) – for the gene previously described as Janus (CG3638) to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names. All instances where Janus was used are now replaced by Motor (abstract; main text pages 9-10; Figure 4).

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

      The main concern, also mentioned by other reviewers, is the lack of mutational data and functional studies on the identified binding sites. Two other structures of TRAP transporters have been determined, one from Haemophilus influenzae (Hi) and the other from Photobacterium profundum (Pp). We will refer to the references in this paper as [1], Peter et al. as [2], and Davies et al. as [3]. The table below lists all the mutations made in the Neu5Ac binding site, including direct polar interactions between Neu5Ac and the side chains, as well as the newly identified metal sites.

      The structure of Fusobacterium nucleatum (Fn) that we have reported shows a significant sequence identity with the previously reported Hi structure. When we superimpose the Pp and Fn structures, we observe that nearly all the residues that bind to the Neu5Ac and the third metal site are conserved. This suggests that mutagenesis and functional studies from other research can be related to the structure presented in our work.

      The table below shows that all three residues that directly interact with Neu5Ac have been tested by site-directed mutagenesis for their role in Neu5Ac transport. Both D521 and S300 are critical for transport, while S345 is not. We do not believe that a mutation of D521A in Fn, followed by transport studies, will provide any new information.

      However, Peter et al. have mutated only one of the 5 residues near the newly identified metal binding site, which resulted in no transport. The rest of the residues have not been functionally tested. We propose to mutate these residues into Ala, express and purify the proteins, and then carry out transport assays on those that show expression. We will include this information in the revised manuscript.

      Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

      Please see the comments above.

      Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

      We appreciate your feedback and will make the necessary modifications to the manuscript incorporating most of the suggestions. We will submit the revised version once the experiments are completed. We are also working on improving the quality of the figures and have made several attempts to enhance the resolution using CryoSPARC or RELION, but without success. We will continue to explore newer methods in an effort to achieve higher resolution and to model more lipids, particularly in the binding pocket.

    1. Author response:

      Reviewer #1 (Public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      Response: upon revision, we plan to rewrite the introduction of the manuscript.

      (2) For the sequencing, which kit was used on the Novaseq6000?

      Response: for sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and will add the information in Methods.

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      Response: we apologize for the inadequacy of descriptions of data analysis process due to word count limit. We plan to provide more information, and if possible we also would like to provide scripts as supplementary data in the revised manuscript.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      Response: we will add the list of marker genes for cell type annotation in the revised manuscript.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      Response: considering this inadequacy, we plan to use statistic approaches for further analyses to compare the differences between each set of groups up revision.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Response: we feel sorry for impreciseness when presenting histograms such as Fig 2D and we will add labels in Y-axis. As for the width of bars, we just used the histograms generated originally from the data package. However, we did not intend to double the width on purpose to strengthen the visual importance. We sincerely feel sorry for this and will correct the similar mistakes alongside the whole manuscript.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Response: we agree that many conclusions, which were based on bio-informatic predictions, are written in an over-affirmative way. Upon revision, we will rewrite these conclusions more precisely.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Response: we are thankful for this suggestion. We think that each cluster of epithelial cells is specified from other clusters and identified by DEGs, but they are not heavily unconnected from others. Upon revision, we plan to add further validation for the existence of Epi_10_CYSTM1.

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Response: from the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. We plan to rewrite the conclusion more precisely or delete this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      Response: we feel thankful for this question. The conclusion “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We will correct the description in the up-coming revised manuscript. As for SLC26A3, we also do not think it is “broadly” expressed, but it is specified in later tumors. When we presented the data of IHC, we only showed the strongly-positive area of each slide in order to emphasize the differences, however, this has caused misunderstandings. Thus, upon revision, we would like to show the other areas of one case or even the scan of one whole slide as supplementary data.

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Response: we apologize for the ignorance of further validation of cytotoxic T cells. From fig. 4B and 4C, the four different clusters of T cells were basically identified based on canonical T cell markers. And then we focused mainly on the validation and further analysis of Tregs, neglecting the other clusters. In fig. 4D we intended to only show the top DEGs in each T cell cluster and hoped to find some potential marker genes for next-step analysis. However, we did not notice that there might be contamination of epithelial cells within cytotoxic T cells when clustering. We will optimize the analysis of this part in our revision.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Response: our initial purpose was to use GO analysis as supports for our conclusions. However we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we plan to rewrite the conclusion from the GO analysis in a more scientific way or delete these data.

      Reviewer #2 (Public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      Response: we understand that many of the conclusions are too sure but lack profound supporting evidence, thus we will optimize the writing in the revised manuscript. More importantly, to strengthen the validity of our data, we will try to use statistical approaches for further analysis.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      Response: we sincerely feel grateful for being questioned on the validity, appropriateness and the real potential of SLC26A3. We plan to add more explanation of the importance of SLC26A3 in the discussion part. We are also sorry for some over-sure conclusions about ADC-specific cell clusters, as well as the marker gene SLC26A3. However, we do not think these conclusions are problematic. In fact, due to the heterogeneity among different individuals, as well as even different sites within one individual when sampling, we think a “small faction” does not means it will not make sense. Also, these ADC-specific clusters (including Epi_10_CYSTM1) do have certain proportions when comparing with those “big fraction” groups (Fig. 2D). Furthermore, when considering the specificity of DEGs to ADC only, but not to SCC, we think it might be these ADC-specific cluster genes to have the central function to make a difference between ADC and SCC. And we further used validation experiment to support our hypothesis. Lastly and most importantly, SLC26A3 was coming from sample 7 whose clinical stage is FIGO IIIC (late stage) and pathological type is ADC. Among the 15 cases, there are only 4 cases whose clinical stages are late (within which 3 are ADC). At this point of view, we think 1 in 3 (33%) having expression of SLC26A3 (or existence of cluster Epi_10_CYSTM1) should be considered as a potential choice. Samples coming from early-staged and SCC patients do not have fractions of Epi_10_CYSTM1. This likewise indicates the specificity of this cell cluster to ADC. Therefore, in our revised manuscript, we plan to add more in-depth discussion about this question.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Response: do you mean Figure 1B and D? In the revised manuscript, we will list the canonical marker genes to cluster different types of cells to at least support that the clustering of cell types match most of the present published references. To further avoid the contamination of cells in each cluster, we will use quality controls and re-analyze these data upon revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: One potential weakness in the manuscript is the lack of reported uncertainties related to the analyzed quantities. Providing this information would significantly enhance the clarity regarding the reliability of the analyses and the confidence in the claims presented.

      Response and revision: We agree with the reviewer that reporting uncertainties will clarify and strengthen our arguments. Following this suggestion, we have added error bars to Figures 3 and 5 representing the standard deviation of the K-E salt bridge probability. This shows that the deviation across replicas of how often the salt bridge is present. Thus, it better supports our claim that this salt bridge is promoted by the presence of PHI1, as the deviation of the salt bridge is minimal for protomers containing PHI1. In addition to these error bars, we have also included a table to the Supplementary Information (Supplementary Table 2) containing the mean and standard deviation of the αC position, K-E distance, and DFG pseudo dihedral for each protomer in our dimer simulations.

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA-approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signaling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: The use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems. However, the authors could consider the adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      Response: The current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Suggestions to author)

      Comment 1: The general recommendation is to give more details about the procedure for the analyses performed and, when possible, show the uncertainties relative to the analyzed quantities. This would clearly indicate the reliability of the analyses and the confidence of the claims. Moreover, it is not always clear how the analyses were performed.

      Response and revision: As previously mentioned, we have added uncertainties to our bar graphs in Figures 3 and 5 as well as Supplemental Table 2. In regards to the clarity of our analysis, we added more detail on how the probability distributions were created, which we will discuss in our response to Comment 3.

      Comment 2: It is not clear why the authors decided to titrate only the histidines without considering the other charged residues. In particular, the authors show in Supplementary Figure 2 a network of which Asp595 (protomer A) is a part and that, given the direct interaction, could affect the protonation state of His477 (protomer B).

      Response: The reviewer is correct in that Asp595 directly interacts with His477 on the opposite protomer. This is exactly the reason why we did not consider titrating Asp595 – the interaction with His477 should further stabilize the charged state of Asp595 and downshift its pKa from the solution value of about 3.8. Thus, Asp595 will be charged at physiological pH and does not need to be titrated in the CpHMD simulations.

      Comment 3: Regarding the probability density plots (Figures 3 and 5), clarify if you used all the data from all the replicas and all the protomers. If possible, show a comparison between each replica in the Supplementary Figures. A Supplementary Table with the probability values for the measured K-E salt bridge could be helpful since the bar plots are hard to compare. Also in this case please report the uncertainty or a comparison between the replicas.

      Response and revision: To clarify how we created the probability density plots, the following line was added to the Methods section:

      On page 15, third paragraph: All probability distributions were created by combining the last three µs of each replica for each system, with each distribution consisting of 50 bins. Unless specified, distributions contain quantities from both protomers in dimeric simulations.

      As previously mentioned, we have included Supplemental Table 2 which contains the mean and standard deviation of the K-E distance across systems. For comparison between replicas, we found the time series of the K-E distance in the inhibitor-bound monomer and dimer systems in Supplemental Figure 7 to be sufficient.

      Comment 4: It would be better to define the claim: ”it is clear that the timescale of the DFG-out to DFG-in transition is longer than our simulation timeframe of a few microseconds” (lines 208-209). To me it is not obvious why this should be ”clear”.

      Response and revision: Our original statement was to convey that, as DFG-in is sampled very rarely, our simulations cannot accurately represent DFG transitions. We have revised the manuscript to the following:

      On page 6, fourth paragraph: While this does suggest dimerization loosens the DFG motif, our simulations do not appropriately model the DFG-out/-in transition as the DFG-in state is only occasionally sampled.

      Comment 5: In the case of the inhibited monomer simulations, the authors state: ”the PHI1Glu501 interaction can become completely disrupted, with the distance moving beyond 6 A to˚ as high as 12 A; correlated with the disruption of the PHI1-Glu501 interaction, the˚     αC position is shifted out to the range of 21 A-24˚ A” (lines 241-244). However, the plot of the PHI1-Glu501˚ interaction time-series (Supplementary Figure 7) shows that just in one replica of one protomer (Protomer A), the interaction is disrupted, and the αC position never exceeds 21 A (time-series˚ reported in Supplementary Figure 6). None of the fluctuations of the αC position appear to be correlated with the disruption of the ligand-Glu501 interaction. The time-series reported in Supplementary Figures 6 and 7 suggest that the two events are uncorrelated. Please explain this aspect or quantify the correlation to support your claim.

      Response: We believe the source of this confusion is because we did not include a time series of αC for inhibited monomer simulations–Supplementary Figure 6 mentioned in the comment is of dimeric BRAF. Thus, We have added Supplementary Figure 8, a timeseries plot of the αC position for inhibited monomer and dimer protomers.

      Comment 6: Regarding the analyses of the positive cooperativity, the DFG dihedral probability densities for the apo protomer (Figure 5a) are highly overlapping. Thus, it is hard to believe that these small differences support the claim that ”PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor” (lines 300-302). The authors should show that the differences in the DFG distributions (in particular, apo dimer vs PHI1 mixed) are statistically significant. Only in this case, the data could support the claim that PHI1 bound to one protomer modulates the DFG conformation in the second one. In my opinion, the overlap between the DFG dihedral probability (Figure 5a) is too high to support the claim that PHI1 is able to allosterically modulate this region in the second apo protomer. Please provide an appropriate statistical test that demonstrates that those distributions are significantly different.

      Response and revision: We have adjusted this statement based on the new Supplementary Table 2 to read as the following:

      On page 9, third paragraph: Although the shift is small (the differences between means is approximately one standard deviation, see Supplementary Table 2), it suggests that PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor. In contrast, the DFG dihedral of the apo protomer in the LY-bound mixed dimer appears to be slightly smaller than the apo dimer with difference between means of approximately one standard deviation (Supplementary Table 2), which is unfavorable for binding the second inhibitor (orange and grey, Figure 5a right).

      Comment 7: Regarding the dimer holo simulations, I agree that in the LY-bound dimer simulations, the hydrogen bond between the ligand and the E501 is weaker, but I do not understand the sentence ”as seen from the local density maximum centered at∼3.4 A” at line 233, since the 2D˚ density plot (Figure 3h) shows that the highest peak is close to 5 A. Also, it would be useful to˚ clarify how these 2D density plots reported in Figure 3 were obtained.

      Response and revision: While the highest peak in Figure 3h is close to 5 A, we were more˚ interested in the local peak close to 3.4 A. To avoid confusion we have modified the line to separate˚ both peaks:

      On page 7, second paragraph: In the LY-bound dimer simulations, however, the LY–Glu501 h-bond is weaker and less stable than the counterpart of the PHI1-bound dimer, as seen from the local density maximum centered at ∼3.4 and the global maximum near ∼4.5 A (Figure 3g,h).˚

      Comment 8: I have a comment on the strategy suggested to empirically classify the inhibitors by comparing the Glu501-Lys483 distance and the αC position in the two protomers of the crystal structures (in the Concluding Discussion section). The authors suggest that differences below 1 A could determine whether the flexibility of these regions is restricted or not (and whether the˚ inhibitor is equipotent or dimer-selective). However, differences below 1 A, in structures where˚ the average resolution is 2.5 A, might be highly unreliable. In fact, as the authors pointed out, LY˚ and Ponatinib would be classified (erroneously) as dimer-selective inhibitors according to these criteria.

      Response and revision: We agree that this proposed method could be unreliable; we intend this strategy to be used as a “quick and dirty” method for analyzing future structures in order to assess selectivity for dimeric BRAF. To convey this, we added the following sentence:

      On page 12, second paragraph: Given that the resolution of a resolved structure is often ∼23 A, this proposed assessment is not intended to replace more rigorous tests, i.e. utilizing MD˚ simulations.

      Comment 9: A suggestion is to include representative snapshots of the MD simulation in the GitHub repository could allow the reader to better appreciate the results described in the present study.

      Response and revision: In order to convey the difference between induced effects of PHI1 and LY, we have added a new folder named snapshots to the GitHub repository which contains the snapshots from the simulations of one LY or one PHI1 bound BRAF (visualized in Figure 5c) in the form of PDB files.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with the dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single-molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support for the author's proposed model of how the riboswitch dynamics contribute to function.

      (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.

      (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after the extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.

      Weaknesses:

      (1) The authors use only one mutant to confirm that their FRET signal indicates the formation of the KL. Importantly, this mutation does not involve the nucleotides that are part of the KL interaction. It would be more convincing if the authors used mutations in both strands of the KL and performed compensatory mutations that restore base pairing. Experiments like this would solidify the structural interpretation of the work, particularly in the context of the full-length riboG RNA or in the cotranscriptional mimic experiments, which appear to have more conformational heterogeneity.

      We thank the reviewer for describing our work “in-depth characterization” of riboG. We agree with the reviewer and we have added two more mutants, G71C and U72C with the mutations located at the KL (Figure 2– figure supplement 8A, 8B, 9A, 9B, Figure 3– figure supplement 6A, 6B, 7A, 7B, and Figure 4– figure supplement 6A, 6B, 7A, 7B). Furthermore, we have performed compensatory mutations, C30G-G71C and A29G-U72C that restore base pairing in the KL (Figure 2– figure supplement 8C, 8D, 9C, 9D, Figure 3– figure supplement 6C, 6D, 7C, 7D, and Figure 4– figure supplement 6C, 6D, 7C, 7D). We added the experimental results in the revised manuscript accordingly as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).

      (2) The existence of the pre-folded state (intermediate FRET ~0.5) is not well supported in their data and could be explained by an acquisition artifact. The dwell times are very short often only a single frame indicating that there could be a very fast transition (< 0.1s) from low to high FRET that averages to a FRET efficiency of 0.5. To firmly demonstrate that this intermediate FRET state is metastable and not an artifact, the authors need to perform measurements with a faster frame rate and demonstrate that the state is still present.

      We thank the reviewer for the great comment. We added smFRET experiments at higher time resolution, 20 ms, as well as lower time resolution (Figure 2– figure supplement 3).  Based on our experimental results, the intermediate state (EFRET ~0.5) exists at the smFRET collected at 20 ms, 100 ms and 200 ms. 

      (3) The PLOR method employs a non-biologically relevant polymerase (T7 RNAP) to mimic transcription elongation and folding near the elongation complex. T7 RNAP has a shorter exit channel than bacterial RNAPs and therefore, folding in the exit channel may be different between different RNAPs. Additionally, the nascent RNA may interact with bacterial RNAP differently. For these reasons, it is not clear how well the dynamics observed in the T7 ECs recapitulate riboswitch folding dynamics in bacterial ECs where they would occur in nature. 

      We thank the reviewer for the comment. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 13–14).

      Reviewer #2 (Public Review):

      Summary:

      Gao et al. used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very high quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into cotranscriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near-cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

      We thank the reviewer for describing our work “The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14).

      Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. The results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation of the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligandbinding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light on the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool for the understanding of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full-length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when compared to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only in complexity, but also in transcriptional speed, which can directly interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware of is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

      We thank the reviewer for describing our work as “The investigations were very thorough, providing data that supports the conclusions”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the cotranscriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14). And we also agree with the reviewer that the lower NTP may affect the transcriptional speed. Regarding the fluorophores, we purposely placed them away from the KL to avoid their influence on the formation of the KL.

      Reviewer #1 (Recommendations For The Authors):

      Related to weakness 1

      - The authors cite a paper that investigated mutations in the KL duplex but do not include these mutations in their analysis. It is unclear why the authors chose the G77C mutation and not the other mutants previously tested. Can the authors explain their choice of mutation in detail in the text? I also did not see the proposed secondary structure for the G77C mutant shown in Figure 2 -supp 3A in the cited paper, is this a predicted structure? Please explain how this structure was determined. 

      We thank the reviewer for the comment. The reason we chosen the G77C mutation is based on previous report that G77C can disturb the formation of the KL, as we stated in the manuscript as “Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)” ( page 7). And the secondary structure for the G77C mutant was predicted by Mfold, which as cited in the manuscript and added in the reference list as “Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415”. 

      - It is not clear to me that the structural interpretation of their FRET states is correct and that the FRET signal reports on the base pairing of the KL in only the high FRET state. The authors should perform experiments with additional mutations in the KL duplex to confirm that their construct reports on KL duplex formation alone and not other structural dynamics. 

      We thank the reviewer for the comment. We have included additional mutations to establish a connection between the high-FRET state to the formation of the KL. The results have been added to the manuscript as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).  

      - For the full-length riboG-136 (Cy3Cy5 riboG in Figure 4), the authors have clearly defined peaks at 0.6 and 0.4. However, the authors do not explain their structural interpretation of these states. Do the authors believe that the KL is forming in these states? It would be helpful to have data on mutations in the KL in the context of the full-length riboG to better understand the structural transitions of these intermediate states. 

      Based on our mutation studies, we proposed that the peak with EFRET ~0.8 corresponds to the conformation with the KL, while the states with EFRET ~0.4 and 0.6 are the states without a stable KL. 

      Related to weakness 2:

      - For the riboG-apt and riboG-term RNAs, the proposed intermediate FRET state (EFRET = 0.5) is poorly fit by a Gaussian and the dwell times in the state are almost entirely single-frame dwells. It is likely that this state is the result of a camera blurring artifact, in which RNAs undergo a FRET transition between two frames giving an apparent FRET efficiency which is between that of the two transitioning states. This artifact arises when the average dwell times of the true states (Elow and Ehigh) are comparable to the frame duration (within a factor of ~5-10; see https://doi.org/10.1021/acs.jpcb.1c01036). To confirm the presence of the intermediate state, the authors should perform at least a few experiments with higher time resolution to support the existence of the 0.5 state with a lifetime of 0.1 s. Alternatively, the data should be refit to a two-state HMM and the authors could explain in the text that the density in the FRET histogram between the two states is likely due to transitions that are faster than the time resolution of the experiment. 

      We thank the reviewer for the great comment. Taking the suggestion into consideration, we performed smFRET experiments with a higher time resolution of 20 ms. As a result, we still detected the intermediate state, supporting that it is not an artifact. The new data has been included in the revised manuscript (Figure 2-figure supplement 3).  

      Related to weakness 3:

      - The authors depict the polymerase footprint differently in some of the figures and it is unclear if this is part of their model. Is the cartoon RNAP supposed to indicate the RNA:DNA hybrid or the footprint of T7 RNAP on the RNA? For example, in Figure 8a there are 8 nts (left) and 9 nts (right) covered by RNAP, and only 6nts in Figure 6 - supp 2A. This is particularly misleading for the EC-87 and EC-88 in Figure 6 - supp 2, where it is likely that this stem is not formed at all and the KL strand is single-stranded. The authors should clarify and at least indicate in the figure legend if the RNAP cartoon is part of the model or only a representation. 

      We thank the reviewer for bringing the issues to our attention. Due to space limitations, we chose to represent the polymerase footprint differently in Figure 8. However, we have included the statement “DNA templates from EC-87 to EC-105 are not displayed in the model” in the legend of Figure 8 to avoid the confusion.

      Moreover, we have corrected the error of 6 nts Figure 6-supplement figure 2.  

      - With a correct 9 bp RNA:DNA hybrid, the EC-88 construct would not be able to form the top part of the P2 stem and the second half of the KL RNA would be single-stranded. In this case, an interaction between the KL nucleotides would resemble a pseudoknot and not a kissing loop interaction. Can the authors explain if this could explain the heterogeneity they observe in the EC-88 construct compared to the riboGapt  RNA?

      Thank the reviewer for the comment. We have added the statement in the revised manuscript as “The T7 RNA polymerase (RNAP) sequestered about 8 nt of the nascent RNA, preventing the EC-88 construct from forming the P2 stem (Durniak et al., 2008; Huang & Sousa, 2000; Lubkowska et al., 2011; Tahirov et al., 2002; Wang et al., 2022; Yin & Steitz, 2002). Consequently, a pseudoknot structure potentially formed instead of the expected KL. This distinction may account for the observed heterogeneity between EC-88 and riboG-apt” ( page 11).

      Other comments:

      (1) It appears that the FRET histograms in the PLOR experiments (Figure 6 and related figures) only show the fits presumably to highlight the overlays. However, this makes it impossible to determine the goodness of the fit. The authors should instead show the outline of the raw histogram with the fit, or at least show the raw histograms with fits in the supplement. 

      We have replaced Figure 6- figure supplements 2-4 to enhance the clarity of the raw and fitted smFRET histograms.  

      (2) The authors should consider including a concluding paragraph to put the results into a larger context. How does the kinetic window compare to other transcriptional riboswitches? Would the authors comment on how the transcription speed compares to the kinetics for the formation of the KL? 

      We thank the reviewer for the comment. We have added the comparison of riboG to other transcription riboswitches to the manuscript as “Nevertheless, the ligand-sensitive windows of riboswitches during transcription vary. In a study conducted by Helmling et al. using NMR spectroscopy, they proposed a broad transcriptional window for deoxyguanosine-sensing riboswitches, whereby the ligand binding capability gradually diminishes over several nucleotide lengths (Helmling et al., 2017). However, more recent research by Binas et al. and Landgraf et al. on riboswitches sensing ZMP, c-di-GMP, and c-GAMP revealed a narrow window with a sharp transition in binding capability, even with transcript lengths differing by only one or three nucleotides (Binas et al., 2020; Landgraf et al., 2022). In line with the findings for the c-GAMP-sensing riboswitch, our study on the guanidine-IV riboswitch also demonstrated a sharp transition in binding capability with just a single nucleotide extension” ( page 14). 

      We appreciate the reviewer’s comment in comparing the transcription speed to the kinetics of the KL formation. However, we must acknowledge that we have limited kinetic data in this study to confidently make such a comparison.

      (3) Cy3Cy5 RiboG is a confusing name because it implies that the others are not also Cy3Cy5 labeled. The authors should consider changing the names and being consistent throughout. I suggest full-length riboG or riboG-136. 

      We have changed “Cy3Cy5 riboG” to “Cy3Cy5-full-length riboG” (pages 15 and 16).

      (4) The transcriptional readthrough experiment should be explained when first mentioned in line 109. 

      We have added the citation (Chien et al., 2023) of the transcriptional readthrough experiment to the manuscript as “we noted that the transcriptional read-through of the guanidine-IV riboswitch during the single-round PLOR reaction was sensitive to Gua+, exhibiting an apparent EC50 value of 68.7  7.3 μM (Figure 1D) (Chien et al., 2023)” (page 5). 

      (5) Kd values in text should have uncertainties, and the way these uncertainties are obtained should be explained.

      We have added the uncertainties of Kd values in the revised manuscript ( page 6) and the legend of Figure 2-supplement 6 as “The percentages of the folded state (EFRET ~ 0.8) of Cy3Cy5-riboG-apt were plotted with the concentrations of Gua+ at 0.5 mM Mg2+, with an apparent Kd of 286.0  18.1 μM in three independent experiments”.

      (6) The authors mention "strategies" on line 306, but it is unclear what they are referring to. Are the strategies referring to the constructs (EC-87, etc) or Steps 1-8 in the supplemental figure? Please clarify. 

      We have clarified the confusion by adding “The detailed procedures of strategies 1-8 were shown in Figure 7–figure supplement 1” to the manuscript ( page 12).

      (7) What are the fraction of dynamic traces versus static traces in the cases for the full-length riboG? This would help depict the structural heterogeneity in the population. 

      We have added the fractions of dynamic single-molecule traces of the full-length riboG to Figure 4-supplements 1-5. 

      (8) The labels in Figure 4 (A-E) don't match the caption (A-H). 

      We have corrected the error. 

      (9) The coloring of the RNA strands in Figure 4A should be explained in the figure legend. It could be interpreted as multiple strands annealed instead of a continuous strand. 

      We have revised the legend of Figure 4A by adding “The full-length riboG contains the aptamer domain (black), terminator (red) and the extended sequence (blue). Cy3 and Cy5 are shown by green and red sparkles, respectively”.

      (10) Reported quantities and uncertainties should have the same number of decimal places. In many places, the uncertainties likely have too many significant figures, for example, in Figure 5 and related figures. 

      We have corrected the significant figures of the uncertainties. 

      (11) In Figure 5, A and B should have the same vertical scale to facilitate comparison. 

      We have adjusted Figure 5A to match the vertical scale of Figure 5B in the revised manuscript.

      (12) In Figure 5C-D, the construct from which those trajectories come should be indicated in the legend. 

      We have added the construct to the legend of Figures 5C and D.  

      (13) In Figure 6J, the splines between data points are confusing and can be misleading. They suggest that the data has been fit to a model, but I am not sure if it represents a model. The data points should be colored instead and lines removed. 

      We thank the reviewer for the comment. We have changed Figure 6J by coloring the data points and removing the lines to avoid confusion. 

      (14) Line 330 mentions a P2 structure in Figure 8, but there is no such label in Figure. Please clarify. 

      We thank the reviewer for the comment and have added P2 to Figure 8. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1B. The authors don't seem to address the role of the blue stem-loop following Stems 1 and 2. Is this element needed at all for gene regulation? Does it impact the conformations or folding of the preceding Stems 1 and 2? It seems feasible to disrupt the stem and see whether there is an impact on riboswitch function. 

      We thank the reviewer for the comment. The presence of the sequence which formed blue stem-loop indicates the formation of an anti-terminator conformation in riboG during transcription. Our smFRET data shows that the inclusion of the stem-loop sequence induces additional peaks in the full-length riboG compared to the riboGterm. This indicates that the stem-loop influences the folding of the kissing loop (KL) and potentially also affects the stems 1 and 2.  

      (2) Figure 7 supplement 1, C &D. Maybe I am missing something, but it seems to me in reaction #8 (EC-105, last two lanes), the readthrough percentage is close to 50% based on the gel but plotted in D as 20%. Further, there is a strong effect of guanidine in reaction #8 but that is not reflected in the quantitation in panel D. 

      We thank the reviewer for the comment. The observed discrepancy between reaction 8 in (C) and (D) is from the differential handling of the crude product at the last step (step 17) in gel loading for (C), contrasted with the combination of crude products from steps 16 and 17 to calculate the read-through percentage in (D). We have corrected the discrepancy by replacing Figure 7-Supplement figure 1C (now Figure 7C), and revised the legend to include the following clarification: “Taking into consideration that the 17 step-PLOR reaction exhibited a pause within the terminator region, resulting in a significant amount of terminated product at step 16, crude products from steps 16 and 17 were collected for (C) and (D) of the 17 step-PLOR reaction (Lanes 15 and 16 in C)”.

      (3) Figure 7C is a control that shows the quality of the elongation complexes, which probably should be in the supplement. Instead, in Figure 7 supplement 1, panels C and D are actual experiments and could be moved into the main figure.  

      We thank the reviewer for the comment. We made the adjustment.  

      (4) Figure S7D. I would suggest not labelling the RNA polymerase halt/stoppage sites due to NTP deprivation as "pausing sites" because transcriptional pausing has previously been defined as natural sites where the RNA polymerase transiently halts itself, but not due to the lack of the next NTPs. In this case, the elongating complexes were artificially halted, which is technically not "pausing", as it will not restart/resume on its own without intervention. 

      We have changed the “pausing” to “halting”.  

      (5) Figure 7 is titled "In vitro transcriptional performance of riboG." But the data is actually not about the performance of the riboswitch, or how well it functions. I would suggest the authors revise the title. This is mostly about the observed sensitivity window of the riboswitch to ligand-mediated conformational switching. 

      We have changed the title of Figure 7 to “Ligand-mediated conformational switching of riboG during transcription”.

      (6) Figure 7A, the illustration gives the visual impression that there are multiple RNA polymerases on the same DNA template, which is not the case. 

      We have revised Figure 7A by adding arrows between RNA polymerases to illustrate the movement of a single RNAP, rather than multiple RNAP on the same template.

      (7) It could be informative to compare the guanidine-IV riboswitch with the first three classes (I, II, III), to see how their architectures or gene regulatory mechanisms are similar or different. 

      We thank the reviewer for the comment. We have added the comparison of the guanidine-IV riboswitch to other three guanidine riboswitches to the manuscript as “The guanidine-IV riboswitch exhibits similarities to the guanidine-I riboswitch in gene regulatory mechanism, functioning as a transcriptional riboswitch. Structurally, it resembles the guanidine-II riboswitch through the formation of loop-loop interactions upon binding to guanidine (Battaglia & Ke, 2018; L. Huang et al., 2017; Lin Huang et al., 2017; Lenkeit et al., 2020; Nelson et al., 2017; Reiss & Strobel, 2017; Salvail et al., 2020)” ( page 12).  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the public review items, I provide the following recommendations:

      (1) As a second language speaker, I understand that writing a compelling and concise story may be hard, and we tend to write more than needed or more repetitively. That being said, I do think that the writing could be improved to make it more concise, clear, and avoid repetitions.

      We thank the reviewer for the comment. We re-wrote the abstract and some sentences in the manuscript.

      (2) In the abstract, instead of saying that "...This lack of understanding has impeded the application of this riboswitch", which makes the statement too strong, perhaps, stating something along the lines of "this understanding would assist the application of this riboswitch", would be a better fit. 

      We have re-wrote the abstract, and revised the sentence.  

      (3) Methods should state which RNA polymerase was used. PLOR uses T7 RNA pol, so I assume it was the same. 

      We have added the statement “T7 RNAP was utilized in the PLOR and in vitro transcription reactions except noted” in the Methods ( page 15). 

      (4) The impact statement says comprehensive structure-function, where perhaps comprehensive folding-function would be more appropriate. We are still missing a lot of structural information about this particular riboswitch. 

      We agree with the reviewer, and changed “comprehensive structure-function” to “folding-function” in Impact statement ( page 2).

      (5) Higher Mg2+ concentrations implicated in a lesser extent of the switch of RiboGapt, a sentence talking about it would be useful (how Mg2+ could have promiscuous interaction and interfere with folding). 

      We have added the role of higher Mg2+ to the manuscript as “However, at a higher concentration of 50.0 mM Mg2+, the proportion of the pre-folded and unfolded conformations were more prevalent at 50.0 mM Mg2+ than at 20.0 mM Mg2+. This suggests that an excess of Mg2+ may promote the pre-folded and even unfolded conformations” ( page 6).

      (6) In the investigations of RiboG-term and RiboG, seems like that monovalents from the buffer are sufficient to promote secondary structure. A statement commenting on this would benefit the paper and the audience. 

      We agree with the reviewer and have accordingly revised the manuscript accordingly by adding “This indicates that monovalent ions in the buffer can facilitate the formation of stable guanidine-IV riboswitch” ( page 8).

      (7) Figure 3. Figure goes to panel E and legend to panel H. G and H colors do not correspond to actual figure colors. 

      We made the correction.  

      (8) Figure 4. The same as Figure 3, the panels and figures are divergent.  

      We made the correction.  

      (9) During the discussion, stating that the DNA and RNA pol play a role in folding and ligand binding may be excessive. This could be an indirect effect of the transcriptional bubble hindering part of the nascent RNA from folding, which is something intrinsic to any transcription and not specific to this system. 

      We agree with the reviewer and deleted the statement about the DNA and RNAP play a role in folding and ligand binding.

      (10) PLOR is not properly cited. When introduced in the manuscript, please cite the original PLOR paper (Liu et. al. Nature 2015) and additional related papers. 

      We cited the original PLOR paper (Liu et al, Nature 2015) and the related papers (Liu et al, Nature Protocols 2018). ( pages 4 and 15)

      (11) The kinetics race of folding and binding could be a little more emphasized in discussion, particularly from the perspective of its physiological importance. 

      We agree with the reviewer and deleted the kinetics race of folding and binding from the Discussion part.

    1. Author response:

      We thank the reviewers for their positive feedback and helpful suggestions for improving our manuscript.

      We appreciate the reviewers highlighting areas where we can improve clarity, particularly in the analysis methodologies and details. We agree that additional control experiments and expansion on single-molecule tracking analysis will provide additional support for our interpretations. 

      We acknowledge the reviewers' suggestion to describe our work's relationship to other studies. While some of our findings are similar to those in past studies, our work introduces a new approach for labeling euchromatin with direct sequence specificity on a genome-wide scale, enabling a deeper understanding of euchromatin organization and dynamics. We will provide more context on the novelty of our work and incorporate a more comprehensive discussion of our work’s relation to other studies in the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

      We thank the reviewer for their time and effort in evaluating our manuscript. We agree that future studies are needed to further determine the role of the genes that we have identified through RNA sequencing in the late life emergence of neurodegenerative conditions after the exposure to mild head trauma. We would like to investigate whether elevating mated female immunity can mitigate the risk for age-dependent neurodegeneration after mild head trauma.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

      We thank the reviewer for the evaluation of our manuscript and for the suggestion to include the exact n values in the figure legends. We will include the n values in our revision.

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings.  It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We will revise the manuscript to make these points clear. 

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains.

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and essentially no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We will incorporate these important points in the revision. 

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the late-life onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR.

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SP-like molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript, but will revise it to further clarify the discussion.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on the technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and we fully agree that this reference should be more fully discussed in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and the excitement for our results, including our findings on T3SS activity and Shigella-septin interactions_._ In accordance with the Reviewer’s comments, we agree to carefully re-edit our manuscript to avoid overselling our data in a future version of the manuscript. We will also consider to rearrange figures depending on new results.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescent labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

    1. Author response:

      We would like to thank all reviewers and editors for their thorough peer review and valuable suggestions. In these provisional responses, we summarize the main concerns raised by the reviewers and outline our planned revisions to address them in the manuscript.

      Overall, we are pleased to note that the reviewers agree on the potential value of our updated toolbox for gene editing, highlighting its various applications. However, they also raised several valid concerns, which we have summarized and responded to as follows:

      (1) Mutant phenotypes in transfected populations can be occasionally reversed or escaped. This suggests it will not be possible to detect growth-associated phenotypes in pooled screens. An experiment with a pooled loss-of-function screen to test this is missing.

      Escapes or reversals of mutant phenotypes have been observed with other genetic tools used for loss-of-function screening, including lentiviral CRISPR approaches in mammalian systems and RNAi in Trypanosoma brucei. Cells can escape phenotypes through various mechanisms, such as promoter silencing or selection of non-deleterious mutations. Additionally, not every CRISPR guide is efficient in generating a mutant phenotype, and RNAi constructs can also vary in their effectiveness. Despite these challenges, genome-wide loss-of-function screens have been successfully carried out in mammalian cells and Trypanosoma parasites. Therefore, we believe that the observed escape of one mutant phenotype does not preclude the detection of growth-associated or other phenotypes in pooled screens. Moreover, we did not observe a reversal of the mutant phenotype in L. mexicana, L. donovani, and L. major parasites expressing tdTomato from an expression cassette integrated into the 18S rRNA SSU locus (Figure 4). However, the reviewers are rightfully requesting a pooled loss-of-function screen to validate this. Since submitting this manuscript, we have conducted multiple pooled loss-of-function screens, which have confirmed the ability of our here presented method to detect a range of mutant phenotypes in pooled screening formats. We will include these results in our revised manuscript.

      (2) The possibility of mis-integration of the CBE sgRNA expression construct into an entirely different locus is not explored.

      We plan to reanalyze our ONT sequencing data to verify if the CBE sgRNA expression construct was integrated into an unintended loci. If we detect any mis-integration events, we will evaluate their potential negative impacts and discuss these findings in the revised manuscript.

      (3) The achieved increase in editing efficiency compared to the previous base editing method could be more clearly presented.

      We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. In the revised manuscript, we will present and describe the increase in editing rate more clearly.

      (4) The improvements on CBE sgRNA guide design are hypothetical and untested.

      We agree that the improvements to the CBE sgRNA design are currently hypothetical. We plan to systematically test our guide design principles in future studies. Since this will require testing hundreds of guides to draw robust conclusions, we believe that this aspect is beyond the scope of the current study. However, we will discuss our plans for future validation in the revised manuscript.

      Overall, we appreciate the reviewers' insights and are committed to addressing their concerns thoroughly. We believe that the planned revisions and additional experiments will significantly strengthen our manuscript and provide a more comprehensive evaluation of our updated gene editing toolbox.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2 (Public Review):

      Weaknesses:

      The comparison of affinity predictions derived from AlphaFold2 and H3-opt models, based on molecular dynamics simulations, should have been discussed in depth. In some cases, there are huge differences between the estimations from H3-opt models and those from experimental structures. It seems that the authors obtained average differences of the real delta, instead of average differences of the absolute value of the delta. This can be misleading, because high negative differences might be compensated by high positive differences when computing the mean value. Moreover, it would have been good for the authors to disclose the trajectories from the MD simulations.

      Thanks for your careful checks. We fully understand your concerns about the large differences when calculating affinity. To understand the source of these huge differences, we carefully analyzed the trajectories of the input structures during MD simulations. We found that the antigen-antibody complex shifted as it transited from NVT to NPT during pre-equilibrium, even when restraints are used to determine the protein structure. To address this issue, we consulted the solution provided on Amber's mailing list (http://archive.ambermd.org/202102/0298.html) and modified the top file ATOMS_MOLECULE item of the simulation system to merge the antigen-antibody complexes into one molecule. As a result, the number of SOLVENT_POINTERS was also adjusted. Finally, we performed all MD simulations and calculated affinities of all complexes.

      We have corrected the “Afterwards, a 25000-step NVT simulation with a time step of 1 fs was performed to gradually heat the system from 0 K to 100 K. A 250000-step NPT simulation with a time step of 2 fs was carried out to further heat the system from 100 K to 298 K.” into “Afterwards, a 400-ps NVT simulation with a time step of 2 fs was performed to gradually heat the system from 0 K to 298 K (0–100 K: 100 ps; 100-298 K: 200 ps; hold 298 K: 100 ps), and a 100-ps NPT simulation with a time step of 2 fs was performed to equilibrate the density of the system. During heating and density equilibration, we constrained the antigen-antibody structure with a restraint value of 10 kcal×mol-1×Å-2.” and added the following sentence in the Method section of our revised manuscript: “The first 50 ns restrains the non-hydrogen atoms of the antigen-antibody complex, and the last 50 ns restrains the non-hydrogen atoms of the antigen, with a constraint value of 10 kcal×mol-1×Å-2”

      In addition, we have corrected the calculation of mean deltas using absolute values and have demonstrated that the average affinities of structures predicted by H3-OPT were closer to those of experimentally determined structures than values obtained through AF2. These results have been updated in the revised manuscript. However, significant differences still exist between the estimations of H3-OPT models and those derived from experimental structures in few cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone (RMSD of antibody backbone) exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark because these predicted structures moved away from the antigen structure during MD simulations, resulting in huge energy differences from the native structures.

      Author response table 1.

      We also appreciate your reminder, and we have calculated all RMSDbackbone during production runs (SI Fig. 5).

      Author response image 1.

      Reviewer #3 (Public Review):

      Weaknesses:

      The proposed method lacks of a confidence score or a warning to help guiding the users in moderate to challenging cases.

      We were sorry for our mistakes. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section: “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      Reviewer #2 (Recommendations For The Authors):

      I would strongly suggest that the authors deepen their discussion on the affinity prediction based on Molecular Dynamics. In particular, why do the authors think that some structures exhibit huge differences between the predictions from the experimental structure and the predicted by H3-opt? Also, please compute the mean deltas using the absolute value and not the real value; the letter can be extremely misleading and hidden very high differences in different directions that are compensating when averaging.

      I would also advice to include graphical results of the MD trajectories, at least as Supp. Material.

      We gratefully thank you for your feedback and fully understand your concerns. We found the source of these huge differences and solved this problem by changing method of MD simulations. Then, we calculated all affinities and corrected the mean deltas calculation using the absolute value. The RMSDbackbone values were also measured to enable accurate affinity predictions during production runs (SI Fig. 5). There are still big differences between the estimations of H3-OPT models and those from experimental structures in some cases. We found that antibodies moved away from antigens both in AF2 and H3-OPT predicted complexes during simulations, resulting in RMSDbackbone exceeding 20 Å. These deviations led to significant structural changes in the complexes and consequently resulted in notable differences in affinity calculations. Thus, we removed three samples (PDBID: 4qhu, 6flc, 6plk) from benchmark.

      Thanks again for your professional advice.

      Reviewer #3 (Recommendations For The Authors):

      (1) I am pleased with the most of the answers provided by the authors to the first review. In my humble opinion, the new manuscript has greatly improved. However, I think some answers to the reviewers are worth to be included in the main text or supporting information for the benefit of general readers. In particular, the requested statistics (i.e. p-values for Cα-RMSD values across the modeling approaches, p-values and error bars in Fig 5a and 5b, etc.) should be introduced in the manuscript.

      We sincerely appreciate your advice. We have added the statistics values to Fig. 4 and Fig. 5 to our manuscript.

      Author response image 2.

      Author response image 3.

      (2) Similarly, authors state in the answers that "we have trained a separate module to predict the confidence score of the optimized CDR-H3 loops". That sounds a great improvement to H3-OPT! However, I couldn't find any reference of that new module in the reviewed version of the manuscript, nor in the available GitHub code. That is the reason for me to hold the weakness "The proposed method lacks of a confidence score".

      We were really sorry for our careless mistakes. Thank you for your reminding. We have updated our GitHub code and added following sentences to clarify how we train this confidence score module in Method Section:

      “Confidence score prediction module

      We apply an MSE loss for confidence prediction, label error was calculated as the Cα deviation of each residue after alignment. The inputs of this module are the same as those used for H3-OPT, and it generates a confidence score ranging from 0 to 100. The dropout rates of H3-OPT were set to 0.25. The learning rate and weight decay of Adam optimizer are set to 1 × 10−5 and 1 × 10−4, respectively.”

      (3) I acknowledge all the efforts made for solving new mutant/designed nanobody structures. Judging from the solved structures, mutants Y95F and Q118N seems critical to either crystallographic or dimerization contacts stabilizing the CDR-H3 loop, hence preventing the formation of crystals. Clearly, solving a molecular structure is a challenge, hence including the following comment in the manuscript is relevant for readers to correctly asset the magnitude of the validation: "The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template. The CDR-H3 lengths of these nanobodies are both 17. According to our classification strategy, these nanobodies belong to Sub1. The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM."

      We appreciate your kind recommendations and have revised “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT, only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively.” into “Although Mut1 (E45A) and Mut2 (Q14N) shared the same CDR-H3 sequences as WT (LengthCDR-H3 = 17), only minor variations were observed in the CDR-H3. H3-OPT generated accurate predictions with Cα-RMSDs of 1.510 Å, 1.541 Å and 1.411 Å for the WT, Mut1, and Mut2, respectively (The confidence scores of these AlphaFold2 predicted loops were all higher than 0.8, and these loops were accepted as the outputs of H3-OPT by CBM). ”. In addition, we have added following sentence in the legend of Figure 4 to ensure that readers can appropriately evaluate the significance and reliability of our validations: “The sequence identities of the VH domain and H3 loop are 0.816 and 0.647, respectively, comparing with the best template.”.

      (4) As pointed out in the first review, I think the work https://doi.org/10.1021/acs.jctc.1c00341 is worth acknowledging in section "2.2 Molecular dynamics (MD) simulations could not provide accurate CDR-H3 loop conformations" of supplementary material, as it constitutes a clear reference (and probably one of the few) to the MD simulations that authors pretend to perform. Similarly, the work https://doi.org/10.3390/molecules28103991 introduces a former benchmark on AI algorithms for predicting antibody and nanobody structures that readers may find interest to contrast with the present work. Indeed, this later reference is used by authors to answer a reviewer comment.

      Thanks a lot for your valuable comments. We have added these references in the proper positions in our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for their encouraging comments. Reviewer 1 raises an important question regarding the translation of biomarker derived data into dietary recommendations, taking the high variability in food composition into consideration. Unfortunately, there is no straightforward answer as the high variability in food composition means that the number of cups of tea for 200mg of flavan-3-ols will depend on the flavanol content of the tea. A probabilistic modelling approach, as we have used to investigate the impact of food content variability on estimated associations with health outcomes, would be a possible solution. This could provide food based recommendations that would meet a defined intake with a certain probability. However, developing and exploring such models is beyond the scope of this manuscript and we have therefore decided not to include this in our response. We have stated in the manuscript that such a method needs to be developed.

      We have addressed the typographical errors and the other comments as follows:

      •   Line 126 - this is the first mention of DR-FCT and as such it needs to be defined. This was a typo and it was corrected throughout the manuscript. The actual abbreviation is DD-FCT and it is defined in line 78.

      •   Figure 4 - what exactly is this figure trying to convey to the reader? A better explanation about this figure is needed. Figure legend was updated and extent hoping to increase clarity.

      •   Figure 5 - Why are the graphs presented differently, meaning why are the data for the flavan-3-ols and epicatechin differentiated for men and women and not nitrate. The sample size for nitrate was too small to stratify in the same way as for flavan-3-ols.

      •   Line 365 - more information is needed, I am assuming the authors are stating ”The tableone package for R ...”. As requested by the reviewer, additional details are now included.

      We have also revised the abstract, the conclusion and the discussion of limitations of the biomarker approach to improve readabilty of the manuscript.

    1. Author response:

      We are thankful to the expert reviewers and the editorial team for their assessment of our manuscript and valuable comments, which will help us to improve our manuscript. While Reviewer #1 appreciated the comprehensive assessment using advanced methods, Reviewer #2 asked for an extension of traditional neuropathological and neuroradiological assessments. Both reviewers identified limitations of the study like the inability to provide direct histopathological evidence for meningitis due to missing meninges tissue, resulting in the conclusions being based on indirect evidence. The reviewers raised concerns about potential post mortem penetration of bacteria into the brain parenchyma. Reviewer #1 also questioned the evidence for cortical siderosis based on the intensity of histological stains.

      We agree with both reviewers and the editorial comment that a traditional neuropathological assessment of meningeal status would have strongly boosted the study's conclusions. Please note that the opportunistic sampling approach after a wild animal’s “natural” death, which is the only ethical method to study infection biology in great apes, is intrinsically accompanied by some limitations such as the lack of standardized post mortem intervals or incomplete sampling. In the revised version of the manuscript, we will complement the advanced MRI and histology already presented by extended traditional neuroradiological and neuropathological assessments as recommended by Reviewer #2, including a report on the status of other organs. However, it is important to note that the interpretation of post mortem MRI of brain material collected in the field differs substantially from conventional in vivo MRI and requires tailored analysis and interpretation. Below we comment on three aspects addressed by reviewers:

      * Missing meninges *: The meninges and associated vessels had to be removed to reduce blood-related artifacts in previously performed MRI measurements. We are aware that this poses a major limitation of this study, and thus rely on the evidence derived from the material at hand. Neuropathological assessment is in agreement with the reviewer's comments that no overt acute bacterial meningitis with e.g. turbid appearance, purulent exudates or frank hemorrhages is apparent in the macroscopic inspection of the presented material. However, the macroscopic changes should be evaluated in the light of the brief time interval between bacterial colonization and death. Meningeal bacterial invasion was visualized on a few meningeal residues we found in case 1, proofing the invasion of the subarachnoid space. Based on the reviewer's suggestions, the microscopic neuropathological evaluation will be expanded with the aim to identify further regions with meningeal residues to include more regions to 1) reduce potential sampling bias and 2) to better characterize the leptomeningeal infiltrates focusing on early inflammatory markers.<br />  However, an extensive assessment of the histopathological inflammatory status must be clarified in future studies on specimens with remaining meninges.

      *Putrefaction/Post mortem bacterial proliferation*:<br /> Reviewers raised important points by remarking  that the tissue alterations could be due to putrefaction/post mortem effects. Classical bacterial putrefaction is unlikely, since no mixed flora of opportunistic bacteria was detected, suggesting that time before fixation was sufficient to prevent secondary bacterial invasion in the presented specimens. Moreover, it has been shown that for the post mortem interval of <24 hours bacterial invasion of the brain is rare even at higher temperatures (Ith et al 2011, https://doi.org/10.1002/nbm.1623). The possibility of post mortem tissue propagation of Bcbva must be considered, since there is a lack of experimental data on the pathogen’s growth after host death, which has been discussed by us in the "Limitations" section in the original manuscript. Although it seems plausible that post mortem multiplication in the brain does occur to a certain extent, several observations suggest that this is not the only mechanism at play in the presented cases. We observed early  microglial activation and astrogliosis indicating a beginning inflammatory reaction in the brain parenchyma. Taken together, the data presented suggest a short time interval between bacterial colonization and death. Under this premise, further analyses for the revision of the manuscript will more closely investigate pathological in vivo tissue alterations.

      *Siderosis* Signs of cortical siderosis were evident in the MRI images of all adult cases (1, 3, and 4), appearing as a hyperintense rim in quantitative R2* maps, indicating substantially elevated levels of iron on the brain surface. These findings were confirmed by Perls’s stain for iron. Such rims in R2* are a typical sign of cortical iron deposition due to siderosis, as observed in conditions like angiopathies. Meningeal bleedings are the most probable source of the elevated iron levels in the cortex. Importantly, such signs were never observed in the post mortem brains of chimpanzees not infected with Anthrax (about 30 cases analyzed so far). Reviewer #1 noted that the intensity of the Perls’s stain seemed too low for siderosis. However, this intensity can vary depending on staining procedure and may be lower for the acute and short disease course of Bcbva-induced Anthrax compared to the chronic human cases Reviewer #1 may be referring to. Taken together, we believe that the evidence of cortical siderosis is compelling, speaking in favor of pre mortem meningeal hemorrhage.

      In summary, in the revised version of the manuscript, we plan to: (1) add a traditional neuroradiological assessment of all scans; (2) present an extended traditional neuropathological assessment of all cases; (3) report results on the status of early inflammatory markers; and (4) discuss the limitations of the study in more detail.

    1. Author response:

      Public Reviews:

      We thank the reviewers for their overall positive assessments and constructive feedback

      Reviewer #1 (Public Review):

      Summary:

      The study explored the biomechanics of kangaroo hopping across both speed and animal size to try and explain the unique and remarkable energetics of kangaroo locomotion.

      Strengths:

      The study brings kangaroo locomotion biomechanics into the 21st century. It is a remarkably difficult project to accomplish. There is excellent attention to detail, supported by clear writing and figures.

      Weaknesses:

      The authors oversell their findings, but the mystery still persists.

      The manuscript lacks a big-picture summary with pointers to how one might resolve the big question.

      General Comments

      This is a very impressive tour de force by an all-star collaborative team of researchers. The study represents a tremendous leap forward (pun intended) in terms of our understanding of kangaroo locomotion. Some might wonder why such an unusual species is of much interest. But, in my opinion, the classic study by Dawson and Taylor in 1973 of kangaroos launched the modern era of running biomechanics/energetics and applies to varying degrees to all animals that use bouncing gaits (running, trotting, galloping and of course hopping). The puzzling metabolic energetics findings of Dawson & Taylor (little if any increase in metabolic power despite increasing forward speed) remain a giant unsolved problem in comparative locomotor biomechanics and energetics. It is our "dark matter problem".

      Thank you for the kind words

      This study is certainly a hop towards solving the problem. But, the title of the paper overpromises and the authors present little attempt to provide an overview of the remaining big issues.

      We will modify the title to reflect this comment.  

      The study clearly shows that the ankle and to a lesser extent the mtp joint are where the action is. They clearly show in great detail by how much and by what means the ankle joint tendons experience increased stress at faster forward speeds.

      Since these were zoo animals, direct measures were not feasible, but the conclusion that the tendons are storing and returning more elastic energy per hop at faster speeds is solid.

      The conclusion that net muscle work per hop changes little from slow to fast forward speeds is also solid.

      Doing less muscle work can only be good if one is trying to minimize metabolic energy consumption. However, to achieve greater tendon stresses, there must be greater muscle forces. Unless one is willing to reject the premise of the cost of generating force hypothesis, that is an important issue to confront.

      Further, the present data support the Kram & Dawson finding of decreased contact times at faster forward speeds. Kram & Taylor and subsequent applications of (and challenges to) their approach supports the idea that shorter contact times (tc) require recruiting more expensive muscle fibers and hence greater metabolic costs. Therefore, I think that it is incumbent on the present authors to clarify that this study has still not tied up the metabolic energetics across speed problems and placed a bow atop the package.

      Fortunately, I am confident that the impressive collective brain power that comprises this author list can craft a paragraph or two that summarizes these ideas and points out how the group is now uniquely and enviably poised to explore the problem more using a dynamic SIMM model that incorporates muscle energetics (perhaps ala' Umberger et al.). Or perhaps they have other ideas about how they can really solve the problem.

      You have raised important points, thank you for this feedback. We will add a paragraph discussing the limitations of our study and ensure the revised manuscript makes it clear which mysteries remain. We intend to address muscle forces, contact time, and energetics in future work when we have implemented all hindlimb muscles within the musculoskeletal model.  

      I have a few issues with the other half of this study (i.e. animal size effects). I would enjoy reading a new paragraph by these authors in the Discussion that considers the evolutionary origins and implications of such small safety factors. Surely, it would need to be speculative, but that's OK.

      We will integrate this into the discussion.

      Reviewer #2 (Public Review):

      Summary

      This is a fascinating topic that has intrigued scientists for decades. I applaud the authors for trying to tackle this enigma. In this manuscript, the authors primarily measured hopping biomechanics data from kangaroos and performed inverse dynamics.

      While these biomechanical analyses were thorough and impressively incorporated collected anatomical data and an Opensim model, I'm afraid that they did not satisfactorily address how kangaroos can hop faster and not consume more metabolic energy, unique from other animals.

      Noticeably, the authors did not collect metabolic data nor did they model metabolic rates using their modelling framework. Instead, they performed a somewhat traditional inverse dynamics analysis from multiple animals hopping at a self-selected speed.

      We aimed to provide a joint-level explanation, but we will address the limitations of not modelling the energy consumers themselves (the skeletal muscles) in the revised manuscript. We plan to expand upon muscle level energetics in the future with a more detailed MSK model.

      Within these analyses, the authors largely focused on ankle EMA, discussing its potential importance (because it affects tendon stress, which affects tendon strain energy, which affects muscle mechanics) on the metabolic cost of hopping. However, EMA was roughly estimated (CoP was fixed to the foot, not measured)…

      As noted in our methods, EMA was not calculated from a fixed centre of pressure (CoP). We did fix the medial-lateral position, owing to the fact that both feet contacted the force plate together, but the anteroposterior movement of the CoP was recorded by the force plate and thus allowed to move. We report the movement (or lack of movement) in our results. The anterior-posterior axis is the most relevant to lengthening or shortening the distance of the ‘out-lever’ R, and thereby EMA.

      It is necessary to assume fixed medial-lateral position because a single force trace and CoP is recorded when two feet land on the force plate. The medial-lateral forces on each foot cancel out so there is no overall medial-lateral movement if the forces are symmetrical (e.g. if the kangaroo is hopping in a straight path and one foot is not in front of the other). We only used symmetrical trials so that the anterior-posterior movement of the CoP would be reliable.

      and did not detectibly associate with hopping speed (see results).

      Yet, the authors interpret their EMA findings as though it systematically related with speed to explain their theory on how metabolic cost is unique in kangaroos vs. other animals.

      Indeed, the relationship between R and speed (and therefore EMA and speed) was not significant. However, the significant change in ankle height with speed, combined with no systematic change in COP at midstance, demonstrates that R would get longer at faster speeds. If we consider the nonsignificant relationship between R and speed to indicate that there is no change in R, then these two results conflict. We could not find a flaw in our methods, so instead concluded that the nonsignificant relationship between R and speed may be due to a small change in R being undetectable in our data. Taking both results into account, we think it is more likely that there is a non-detectable change in R, rather than no change in R with speed, but we presented both results for transparency.

      These speed vs. biomechanics relationships were limited by comparisons across different animals hopping at different speeds and could have been strengthened using repeated measures design.

      There is significant variation in speed within individuals, not just between individuals. The preferred speed of kangaroos is 2-4.5 m/s, but most individuals show a wide range within this. Eight of our 16 kangaroos had a maximum speed that was between 1-2m/s faster than their slowest trial. Repeated measures of these eight individuals comprises 78 out of the 100 trials.

      It would be ideal to collect data across the full range of speeds for all individuals, but it is not feasible in this type of experimental setting. Interference such as chasing is dangerous to kangaroos as they are prone to strong adverse reactions to stress.

      There are also multiple inconsistencies between the authors' theory on how mechanics affect energetics and the cited literature, which leaves me somewhat confused and wanting more clarification and information on how mechanics and energetics relate.

      We will ensure that this is clearer in the revised manuscript.

      My apologies for the less-than-favorable review, I think that this is a neat biomechanics study - but am unsure if it adds much to the literature on the topic of kangaroo hopping energetics in its current form.

      Reviewer #3 (Public Review):

      Summary:

      The goal of this study is to understand how, unlike other mammals, kangaroos are able to increase hopping speed without a concomitant increase in metabolic cost. They use a biomechancial analysis of kangaroo hopping data across a range of speeds to investigate how posture, effective mechanical advantage, and tendon stress vary with speed and mass. The main finding is that a change in posture leads to increasing effective mechanical advantage with speed, which ultimately increases tendon elastic energy storage and returns via greater tendon strain. Thus kangaroos may be able to conserve energy with increasing speed by flexing more, which increases tendon strain.

      Strengths:

      The approach and effort invested into collecting this valuable dataset of kangaroo locomotion is impressive. The dataset alone is a valuable contribution.

      Thank you!

      Weaknesses:

      Despite these strengths, I have concerns regarding the strength of the results and the overall clarity of the paper and methods used (which likely influences how convincingly the main results come across).

      (1) The paper seems to hinge on the finding that EMA decreases with increasing speed and that this contributes significantly to greater tendon strain estimated with increasing speed. It is very difficult to be convinced by this result for a number of reasons:

      • It appears that kangaroos hopped at their preferred speed. Thus the variability observed is across individuals not within. Is this large enough of a range (either within or across subjects) to make conclusions about the effect of speed, without results being susceptible to differences between subjects?

      Apologies, this was not clear in the manuscript. Kangaroos hopping at their preferred speed means we did not chase or startle them into high speeds to comply with ethics and enclosure limitations. Thus we did not record a wide range of speed within the bounds of what kangaroos are capable of (up to 12 m/s), but for the range we did measure (~2-4.5 m/s), there is variation hopping speed within each individual kangaroo. Out of 16 individuals, eight individuals had a difference of 1-2m/s between their slowest and fastest trials, and these kangaroos accounted for 78 out of 100 trials. Of the remainder, six individuals had three for fewer trials each, and two individual had highly repeatable speeds (3 out of 4, and 6 out of 7 trials were within 0.5 m/s). We will ensure this is clear in the revised manuscript.

      In the literature cited, what was the range of speeds measured, and was it within or between subjects?

      For other literature, to our knowledge the highest speed measured is ~9.5m/s (see supplementary Fig1b) and there were multiple measures for several individuals (see methods Kram & Dawson 1998).

      • Assuming that there is a compelling relationship between EMA and velocity, how reasonable is it to extrapolate to the conclusion that this increases tendon strain and ultimately saves metabolic cost?

      They correlate EMA with tendon strain, but this would still not suggest a causal relationship (incidentally the p-value for the correlation is not reported).

      We will add supporting literature on the relationship between metabolic cost and tendon stress (or strain), to elaborate on why the correlation between EMA and stress is important.

      Tendon strain could be increasing with ground reaction force, independent of EMA.

      Even if there is a correlation between strain and EMA, is it not a mathematical necessity in their model that all else being equal, tendon stress will increase as ema decreases? I may be missing something, but nonetheless, it would be helpful for the authors to clarify the strength of the evidence supporting their conclusions.

      Yes, GRF also contributes to the increase in tendon stress in the mechanism we propose. We have illustrated this in Fig 6, however we will make this clearer in the revised discussion.

      • The statistical approach is not well-described. It is not clear what the form of the statistical model used was and whether the analysis treated each trial individually or grouped trials by the kangaroo. There is also no mention of how many trials per kangaroo, or the range of speeds (or masses) tested.

      The methods include the statistical model with the variables that we used, as well as the kangaroo masses (13.7 to 26.6 kg, mean: 20.9 ± 3.4 kg). We will move the range of speeds from the supplementary material to the results or figure captions. We will add information on the number of trials per kangaroo to the methods.

      We did not group the data e.g. by using an average speed per individual for all their trials, or by comparing fast to slow groups (this was for display purposes in our figures, which we will make clearer in the methods).

      Related to this, there is no mention of how different speeds were obtained. It seems that kangaroos hopped at a self-selected pace, thus it appears that not much variation was observed. I appreciate the difficulty of conducting these experiments in a controlled manner, but this doesn't exempt the authors from providing the details of their approach.

      • Some figures (Figure 2 for example) present means for one of three speeds, yet the speeds are not reported (except in the legend) nor how these bins were determined, nor how many trials or kangaroos fit in each bin. A similar comment applies to the mass categories. It would be more convincing if the authors plotted the main metrics vs. speed to illustrate the significant trends they are reporting.

      Thank you for this comment. The bins are used only for display purposes and not within the analysis. In the revised manuscript, we will ensure this is clear.

      (2) The significance of the effects of mass is not clear. The introduction and abstract suggest that the paper is focused on the effect of speed, yet the effects of mass are reported throughout as well, without a clear understanding of the significance. This weakness is further exaggerated by the fact that the details of the subject masses are not reported.

      Indeed, the primary aim of our study was to explore the influence of speed, given the uncoupling of energy from hopping speed in kangaroos. We included mass to ensure that the effects of speed were not driven by body mass (i.e.: that larger kangaroos hopped faster).  

      (3) The paper needs to be significantly re-written to better incorporate the methods into the results section. Since the results come before the methods, some of the methods must necessarily be described such that the study can be understood at some level without turning to the dedicated methods section. As written, it is very difficult to understand the basis of the approach, analysis, and metrics without turning to the methods.

      We agree, and in the revised manuscript will incorporate some of the methodological details within the results.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Major findings or outcomes include a genome for the wasp, characterization of the venom constituents and teratocyte and ovipositor expression profiles, as well as information about Trichopria ecology and parasitism strategies. It was found that Trichopria cannot discriminate among hosts by age, but can identify previously parasitized hosts. The authors also investigated whether superparasitism by Trichopria wasps improved parasitism outcomes (it did), presumably by increasing venom and teratocyte concentrations/densities. Elegant use of Drosophila ectopic expression tools allowed for functional characterization of venom components (Timps), and showed that these proteins are responsible for parasitoid-induced delays in host development. After finding that teratocytes produce a large number of proteases, experiments showed that these contribute to digestion of host tissues for parasite consumption.<br /> The discussion ties these elements together by suggesting that genes used for aiding in parasitism via different parts of the parasitism arsenal arise from gene duplication and shifts in tissue of expression (to venom glands or teratocytes).

      Strengths:

      The strength of this manuscript is that it describes the parasitism strategies used by Trichopria wasps at a molecular and behavioral level with broad strokes. It represents a large amount of work that in previous decades might have been published in several different papers. Including all of these data in a manuscript together makes for a comprehensive and interesting study.

      Weaknesses:

      The weakness is that the breadth of the study results in fairly shallow mechanistic or functional results for any given facet of Trichopria's biology. Although none of the findings are especially novel given results from other parasitoid species in previous publications, integrating results together provides significant information about Trichopria biology.

      We thank the reviewer for appreciating the importance of our study.

      Reviewer #2 (Public Review):

      Summary:

      Key findings of this research include the sequencing of the wasp's genome, identification of venom constituents and teratocytes, and examination of Trichopria drosophilae (Td)'s ecology and parasitic strategies. It was observed that Td doesn't distinguish between hosts based on age but can recognize previously parasitized hosts. The study also explored whether multiple parasitisms by Td improved outcomes, which indeed it did, possibly by increasing venom and teratocyte levels. Utilizing Drosophila ectopic expression tools, the authors functionally characterized venom components, specifically tissue inhibitors of metalloproteinases (Timps), which were found to cause delays in host development. Additionally, experiments revealed that teratocytes produce numerous proteases, aiding in the digestion of host tissues for parasite consumption. The discussion suggests that genes involved in different aspects of parasitism may arise from gene duplication and shifts in tissue expression to venom glands or teratocytes.

      Strengths:

      This manuscript provides an in-depth and detailed depiction of the parasitic strategies employed by Td wasps, spanning both molecular and behavioral aspects. It consolidates a significant amount of research that, in the past, might have been distributed across multiple papers. By presenting all this data in a single manuscript, it delivers a comprehensive and engaging study that could help future developments in the field of biological control against a major insect pest.

      Weaknesses:

      While none of the findings are particularly groundbreaking, as similar results have been reported for other parasitoid species in prior research, the integration of these results into one comprehensive overview offers valuable biological insights into an interesting new potential biocontrol species.

      We thank the reviewer for appreciating the importance of our study and for the suggestions on how to improve it.

      Reviewer #1 (Recommendations For The Authors):

      No additional comments

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Line 68 : would be better to spell out the name of the genus at first mention of the species

      It has been corrected as suggested.

      Lines 90-92 : This statement does to coincide with the figure. Could you please explain this better?

      We have carefully checked the statement and the corresponding figure panels, but failed to find the disparity between them. Perhaps, the similar and neighboring labels of Dsuz and Dsan might cause confusion of the emergence rates. To further avoid this potential, we have modified fig.1b and 1c by highlighting the focal host Dsuz.

      Lines 124: could you tell the mention of these genes (Piwi) is important in this context, particularly, for non- full-on experts in this field?

      A previous study has revealed the relationship between the expansion of piwi and large genome, we meant to report a different pattern in our focal genome. We understand your confusion might be caused by the inserted statement regarding the repeat that separated them. Thus, we have moved the citation of previous finding to the place immediately precedent to the conclusion.

      Line 233: "...composition remains largely unknown.." for Td or in general? Not clear..

      Thank you. To make it clear, we have modified this sentence as “Although teratocytes have been reported in several other parasitoids, their molecular composition remains largely unknown in general”.

      Line 286: "at a certain time".. confusing, please rephrase.

      We have rephrased it as “After a certain time (2 or 4 hours for oviposition choice)”.

      Line 293-294: I find this sentence quite hard to follow. Could you please rephrase it and/or expand this concept to make it clearer?

      We have modified this sentence as “The parasitic success of Td largely relies on locating a young host; however, Td does not have the ability to discriminate between young and old hosts. Whether Td has evolved any adaptive strategies to compensate for this disadvantage?”

      Line 314: "it would be interesting".. this is too weak of an argument. Please corroborate your motivation more soundly.

      We have changed this statement as “Because Td allows conditional intraspecific competition, the next compelling question would be whether Td allows interspecific competition with larval parasitoids.”

      Line 391: Divergent evolution is too of a big word in this context. I would tune it down to something like: "Studying ecological niche differentiation ".

      Thank you. It has been corrected as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      (1) Figure 1: Histomorphological analysis using immunostaining for type I, IIA, IIX, and IIB should be performed and quantified across different muscle groups and also in the soleus. Fiber type switch measured based on qPCR and Westerns does not sufficiently indicate the extent of fiber type switch. Better images for Fig. 1c should be provided.

      Thanks for your suggestion. In fact, we attempted immunofluorescent staining for Slow MyHC and Fast MyHC in GAS muscle. However, for the majority of our results, we only observed positive expression of Slow MyHC in a small portion of the muscle sections (as shown in the figure below), so we did not present this result.

      In addition, due to the size limitations on uploading image files to Biorxiv, we had to compress the images, resulting in lower resolution pictures. We have attempted to submit clearer images in Fig. 1C

      Author response image 1.

      Green: Slow MyHC; Red: Fast MyHC

      (2) Figure 2: Histomorphological analysis for SDH and NADH-TR should be performed and quantified in different muscle groups. Seahorse or oroborous respirometry experiments should be performed to determine the actually increase in mitochondrial respiratory capacity either in isolated mitochondria or single fibers from vehicle and Eugenol-treated mice. Em for mitochondrial should be added to determine the extent of mitochondrial remodeling. The current data is insufficient to indicate the extent of mitochondrial or oxidative remodeling.

      That's a good suggestion. However, we regret to inform you that we are unable to present these results due to a lack of relevant experimental equipment and samples.

      (3) Figure 2: Gene expression analysis is limited to a few transcriptional factors. A thorough analysis of gene expression through RNA-seq should be performed to get an unbiased effect of Eugenol on muscle transcriptome. This is especially important because eugenol is proposed to work through CaN/NFAT signaling, major transcriptional regulators of muscle phenotype.

      Thanks for your suggestion. Indeed, we believe that in terms of reliability and accuracy, RNA-seq is not as good as RT-qPCR. The advantage of RNA-seq lies in its high throughput, making it suitable for screening unknown transcription factor regulatory mechanisms. In this study, the signaling pathways regulating myokines and muscle fiber type transformation are known and limited, with only the CaN/NFATc1 and the AMPK pathway. Since eugenol mainly acts through the Ca2+ pathway, we primarily focus on the CaN/NFATc1 signaling pathway.

      (4) I suggest the inclusion of additional exercise or performance testing including treadmill running, wheel running, and tensiometry. Quantification with a swimming test and measurement of the exact intensity of exercise, etc. is limited.

      That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.

      (5) In addition to muscle performance, whole-body metabolic/energy homeostatic effects should also be measured to determine a potential increase in aerobic metabolism over anaerobic metabolism.

      That's a good suggestion. We apologize for being unable to detect this indicator due to a lack of relevant experimental equipment.

      (6) For the swimming test and other measurements, only 4 weeks of vehicle vs. Eugenol treatment was used. For this type of pharmacological study, a time course should be performed to determine the saturation point of the effect. Does exercise tolerance progressively increase with time?

      Thanks for your suggestion. Due to the potential damage that exhaustive swimming tests inflict on mice, the tested mice are subsequently eliminated to avoid potential interference with the experiment. Therefore, this experiment is only suitable for conducting tests at individual time points.

      (7) The authors should also consider measuring adaptation to exercise training with or without Eugenol.

      Thanks for your suggestion. The purpose of this study is to investigate whether eugenol mimics exercise under standard dietary conditions. In our future research, we will consider exploring the effects of eugenol under HFD and exercise conditions.

      (8) Histomorphological analysis of Wat is also lacking. EchoMRI would give a better picture of lean and fat mass.

      That's a good suggestion. However, we did not collect the slices of WAT tissue, so we are unable to supplement this result, we feel sorry for it. In addition, we apologize for being unable to detect lean and fat mass due to a lack of EchoMRI equipment.

      (9) The experiments performed to demonstrate that Eugenol functions through trpv1 are mostly correlational. Some experiments are needed with trpv1 KO or KD instead of inhibitor. Similarly, KD for other trpv channels should be tested (at least 1-4 that seem to be expressed in the muscle). Triple KO or trpv null cells should be considered to demonstrate that eugenol does not have another biological target.

      Thanks for your professional suggestion. AMG-517 is a specific inhibitor of TRPV1, with a much greater inhibitory effect on TRPV1 compared to other TRP channels. AMG-517 inhibits capsaicin (500 nM), acid (pH 5.0), or heat (45°C) induced Ca2+ influx in cells expressing human TRPV1, with IC50 values of 0.76 nM, 0.62 nM, and 1.3 nM, respectively. However, the IC50 values of AMG-517 for recombinant TRPV2, TRPV3, TRPV4, TRPA1, and TRPM8 cells are >20 μM (Gavva, 2008). Therefore, we believe that using AMG-517 instead of TRPV1 KO cells is sufficient to demonstrate the involvement of TRPV1 in the function of eugenol.

      While this study did not exclude the possibility of other TRP channels' involvement, it was based on the fact that eugenol does not promote mRNA expression of other TRP channels, as shown in Fig4A-C. Indeed, as far as we know, besides TRPV1, the effects of other TRP channels on myofiber type transformation remain unknown. This is an aspect that we plan to investigate in the future.

      Reference

      Gavva NR, Treanor JJ, Garami A, et al. Pharmacological blockade of the vanilloid receptor TRPV1 elicits marked hyperthermia in humans. Pain. 2008;136(1-2):202-210.

      (10) Eugenol + trpv1 inhibition studies are performed in c2c12 cells and only looks at myofiber genes expression. This is incomplete. Some studies in mitochondrial and oxsphos genes should be done.

      Thanks for your suggestion. In the inhibition experiment, we additionally examined the expression of mitochondrial complex proteins as shown in Figure 5C. And the relevant description has been added in lines 178-183 and 764-765.

      (11) The experiments linking Eugenol to ca handling, and calcineurin/nfat activation are all performed in c2c12 cells. There seems to be a link between Eugenol activation and CaN/NFAT activation and fiber type regulation in cells, however, this needs to be tested in mouse studies at the functional level using some of the parameters measured in aims 1 and 2.

      Thank you for your professional suggestion. We will attempt to continue these experiments in future studies.

      (12) The myokine studies are incomplete. The authors show a link between Eugenol treatment and myokines/IL-15 induction. However, this is purely co-relational, without any experiments performed to show whether IL-15 mediates any of the effects of eugenol in mice.

      Indeed, previous studies have adequately demonstrated the regulation of skeletal muscle oxidative metabolism by IL-15. The initial aim of this experiment was to investigate the mechanism by which eugenol promotes IL-15 expression. Through inhibition assays, EMSA, and dual luciferase reporter gene experiments, we have thoroughly demonstrated that eugenol promotes IL-15 expression via the CaN/NFATc1 signaling pathway, thus establishing a novel link between CaN/NFATc1 signaling and the myokine IL-15 expression. In the subsequent experiments, we plan to knock out IL-15 in eugenol-treated C2C12 cells to explore whether IL-15 mediates the effects of eugenol. This will be another aspect of our investigation.

      (13) An additional major concern is that it cannot be ruled out that Engenol is uniquely mediating its effects through trpv1. Ideally, muscle-specific trpv1 mice should be used to perform some experiments with Eugenol to confirm that this ion channel is involved in the physiological effects of eugenol.

      As you suggested, we agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. In our mice experiments, due to the lack of validation of skeletal muscle-specific TRPV1 knockout, we indeed cannot rule out that eugenol is uniquely mediating its effects through TRPV1. We acknowledge this as a limitation of our study. However, due to limitations in research funding and time, we are currently unable to supplement these experiments. Nevertheless, we believe that our results from in vitro experiments using a TRPV1 inhibitor (which selectively inhibits TRPV1) provide evidence of eugenol's action through TRPV1.

      Reviewer #2 (Public Review):

      Weaknesses:

      (1) Apart from Fig.2A and 2B, they mostly utilised protein expression changes as an index of tissue functional changes. Most of the data supporting the conclusions are thus rather indirect. More direct functional evidence would be more compelling. For example, a lipolysis assay could be used to measure the metabolic function of adipocytes after eugenol treatment in Fig.3. Functional activation of NFAT can be demonstrated by examining the nuclear translocation of NFAT.

      Thank you for your professional suggestion. Indeed, as shown in Figure 4G-I, we detected the expression of NFATc1 in the nucleus to illustrate its nuclear translocation.

      (2) To further demonstrate the role of TRPV1 channels in the effects of eugenol, TRPV1-deficient mice and tissues could also be used. Will the improved swimming test in Fig. 2B and increased CaN, NFAT, and IL-15 triggered by eugenol be all prevented in TRPV1-lacking mice and tissues?

      Thank you for your professional suggestion. We agree that muscle-specific TRPV1 mice should be used to conduct some experiments with eugenol. However, due to limitations in research funding and time, we are currently unable to supplement these experiments.

      (3) Direct evidence of eugenol activation of TRPV1 channels in skeletal muscles is also lacking. The flow cytometry assay was used to measure Ca2+ changes in the C2C12 cell line in Fig. 5A. But this assay is rather indirect. It would be more convincing to monitor real-time activation of TRPV1 channels in skeletal muscles not in cell lines using Ca2+ imaging or electrophysiology.

      Thank you for your professional suggestion. As you suggested, we initially planned to use patch-clamp technique to detect membrane potential changes in skeletal muscle cells under eugenol treatment. However, due to experimental technical limitations, this experiment was not successfully conducted. Therefore, we were compelled to rely solely on flow cytometry to detect Ca2+ levels.

      Reviewer #2 (Recommendations For The Authors):

      (1) Most of the mRNA and protein data are consistent with each other. However, some of them are not obvious. For example, PGC1a mRNA was increased by eugenol in Fig. 2C but not seen in protein in Fig. 2D. Similarly, Complex I and V mRNA was increased in Fig. 2C but not obvious at protein levels in Fig. 2D, even though they claimed that Complex I and V were both upregulated by eugenol (see: line 123). Another example: IL-15 mRNA was increased by EUG100 but not by EUG50 in the GAS muscle in Fig. 8A. However, EUG50 increased IL-15 protein expression in Fig. 8B. Similar conflict was also seen in IL-15 expression in the TA muscle in Fig. 8A and 8C.

      Thanks for your question. As shown in the table below, by standardizing with β-Actin, our statistical data indeed indicate that eugenol promotes the expression of Complex I and V proteins (although the upregulation is minimal). Additionally, protein and mRNA expression do not always correlate, which may be due to potential post-transcriptional and post-translational regulation.

      Author response table 1.

      (2) Line 115: Figure 2A should be Figure 2B; Line 119: Figure 2B should be Figure 2A. Alternatively, swap Fig2A with Fig. 2B.

      Thanks for your correction, we have revised the relevant content in lines 111-113 and 724-725.

      (3) Abbreviations of ADF and ADG in Fig. 3A should be defined.

      Thank you for your suggestion. We have defined these abbreviations in lines 123-125.

      (4) Line 154: TRPV1 mRNA expression was promoted by 25 and 50uM eugenol, not by 12.5uM.

      Thank you for your correction. We have revised it in line 150.

      (5) Line 173: Increased expression of NFAT suggests that NFAT is activated. This is a rather weak statement. It is more convincing to show the nuclear translocation of NFAT by eugenol treatment.

      Thank you for your correction. We have revised the describtion in line 166.

      (6) Line 185: The data showing EUG increased slow MyHC fluorescence intensity in Fig. 5D are not clear at all. Quantification is required.

      Thank you for your suggestion. We have attempted to submit clearer images in Figure 5E, and the quantification have been provided.

      (7) Line 235: IL-15 expression is positively correlated with MyHC IIa, suggesting IL-15 is a slow muscle myokine (See line 2398). However, MyHC IIa is a marker of fast muscle fibres (see line 50).

      Thank you for your correction. As you pointed, MyHC IIa is fast-twitch oxidative muscle fiber. We have replaced ‘slow’ with ‘oxidative’ in line 235.

      (8) Fig.9C and 9D show that inhibition of TRPV1 and CaN attenuated the upregulation of IL-15 mRNA and protein by eugenol in C2C12 cell line. This result is important in demonstrating the link of TRPV1 and CaN to IL-15. It will be more interesting and physiologically relevant to perform this experiment in primary skeletal muscle cells isolated from mice.

      Thank you for your suggestion. This is indeed an interesting idea. We will attempt to continue our experiments in mice and primary porcine muscle cells in future studies.

      (9) It is concerning that 4-week-old male mice were used for the study. The 4-week-old mice are immature. Adult mice over 8 weeks should be used. It is thus unknown whether the findings are broadly applicable to adult age.

      Thanks for your professional question. Age indeed has an impact on the muscle fiber type in mammals. Based on previously observed patterns of muscle fiber changes with age in various mammals (Katsumata et al., 2021; Pandorf et al., 2012; Hill et al., 2020), we believe that changes in muscle fiber types occur more frequently in juvenile mammals, mainly manifesting as a sharp increase in fast muscle fibers. Therefore, interventions during the juvenile stage might be more effective in promoting the transformation of fast to slow muscle fibers. As a result, in most of our group's research using nutritional interventions to regulate muscle fiber types, we tend to start interventions from the age of 4 weeks in mice. If we began intervention at 8 weeks, we speculate that the effectiveness would not be as potent as starting at 4 weeks. Below are the patterns of muscle fiber changes with age in various mammalian models, provided for reference:

      (1) Changes in muscle fiber types with age in pigs:

      As shown in the following figure, there is a dramatic change in the muscle fiber types 12 days post birth in pigs, especially with a sharp increase in fast muscle fibers, which continues until day 45. After 45 days of age, the changes in muscle fiber types become relatively gradual.

      Author response table 2.

      Developmental change Of proportions Of muscle fiber types in Longissimus dorsi muscle determined by histochemical analysis for myosin adenosine triphosphatase activity (%)

      Least squares means and pooled standard errors (n = 3). MHC, myosin heavy chain; ND, not detected. *P<0.10, **P<0.01 Least square means followed by different letters on the same row are significantly different (P < 0.05).

      Reference:

      Katsumata, M., Yamaguchi, T., Ishida, A., & Ashihara, A. (2017). Changes in muscle fiber type and expression of mRNA of myosin heavy chain isoforms in porcine muscle during pre- and postnatal development. Animal science journal, 88(2), 364–371.

      (2) Changes in muscle fiber types with age in rats:

      As illustrated in the subsequent figure, the muscle fiber types in rats undergo significant changes before 20 days of age (3-week-old), notably with a pronounced increase in type IIb fast-twitch fibers. After reaching 20 days of age, the changes in type IIb muscle fibers tend to stabilize and become more gradual.

      Author response image 2.

      Reference:

      Pandorf, C. E., Jiang, W., Qin, A. X., Bodell, P. W., Baldwin, K. M., & Haddad, F. (2012). Regulation of an antisense RNA with the transition of neonatal to IIb myosin heavy chain during postnatal development and hypothyroidism in rat skeletal muscle. American journal of physiology. 302(7), R854–R867.

      (3) Changes in muscle fiber types with age in mice:

      As depicted in the following figure, when comparing 10-week-old mice to 78-week-old aged mice, there are no significant changes in muscle fiber types.

      Author response image 3.

      Reference:

      Hill, C., James, R. S., Cox, V. M., Seebacher, F., & Tallis, J. (2020). Age-related changes in isolated mouse skeletal muscle function are dependent on sex, muscle, and contractility mode. American journal of physiology. Regulatory, integrative and comparative physiology, 319(3), R296–R314.

    1. Author response:

      The following is the authors’ response to the original reviews. 

      eLife assessment<br /> This important manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. Compelling evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      eLife assessment, Significance of findings

      This valuable manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. 

      According to the eLife criteria for assessing significance, the “valuable” assessment indicates “findings that have theoretical or practical implications for a subfield.” We have revised the manuscript to emphasize the “theoretical and practical implications beyond a single subfield” which “substantially advance our understanding of major research questions”, with “profound implications” and the potential for “widespread influence,” the eLife criteria for a designation of “landmark” significance.   

      The most immediate implications of our results are for the two major neuroscience subfields of cerebellar research and autism research. However, as recognized by Reviewer 2, the implications are much broader than that: “the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.” We have substantially revised the Discussion section of the manuscript to more explicitly lay out how the central idea of our manuscript-- that the capacity for learning at any given moment is powerfully influenced by dynamic, activity- and plasticity-dependent changes in the threshold for synaptic plasticity over short timescales of tens of minutes to hours --has implications for scientific thinking and experiments on plasticity and learning throughout the brain, as well as clinical practice for a wide array of brain disorders associated with altered plasticity and learning impairment. 

      To emphasize the broad conceptual implications of our research, we have reframed our conclusions in terms of metaplasticity rather than saturation of plasticity throughout the revised manuscript. In our previous submission, we had used the “saturation “ terminology for continuity with our previous NguyenVu et al 2017 eLife paper, and mentioned the related idea of threshold metaplasticity in a single sentence: “Similarly, the aberrant recruitment of LTD before training may lead, not to its saturation per se, but to some other kind of reduced availability, such as an increased threshold for its induction (Bienenstock, Cooper, and Munro, 1982; Leet, Bear, and Gaier, 2022).” However, we now appreciate that metaplasticity is a more general conceptual framework for our findings, and therefore emphasize this concept in the revised manuscript, while still making the conceptual link with the “saturation” idea presented in NguyenVu et al 2017 (lines 236-238). 

      The concept of a sliding threshold for synaptic plasticity (threshold metaplasticity) was proposed four decades ago by Bienenstock, Cooper and Munro (1982) as a mechanism for countering an instability inherent in Hebbian plasticity whereby correlated pre- and post-synaptic activity strengthens a synapse, which leads to an increase in correlated activity, which in turn leads to further strengthening. To counter this, BCM proposed a sliding threshold whereby increases in neural activity increase the threshold for LTP and decreases in activity decrease the threshold for LTP, thereby providing a mechanism for stabilizing firing rates and synaptic weights. This BCM sliding threshold model has been highly influential in theoretical and computational neuroscience, but experimental evidence for whether and how such a mechanism functions in vivo has been quite limited.  

      Our work extends the previous, limited experimental evidence for a BCM-like sliding threshold in vivo in several significant ways, which we now discuss in the revised manuscript:

      First, we analyze threshold metaplasticity at synapses where the plasticity is not Hebbian and lacks the inherent instability that inspired the BCM model. The synapses onto cerebellar Purkinje cells have been described as “anti-Hebbian” because the associative form of plasticity is synaptic LTD of excitatory inputs. This anti-Hebbian associative plasticity lacks the instability inherent in Hebbian plasticity. Moreover, a BCM-like sliding threshold that increases the threshold for associative LTD with increased firing rates and decreases threshold for LTD with decreased firing rates would tend to oppose rather than support the stability of firing rates, nevertheless we find evidence for this in our experimental results. Thus, for cerebellar LTD, the central function of the sliding threshold may not be the stabilization of firing rates, but rather to limit plasticity in order to suppress the overwrite of new memories or to allocate different memories to the synapses of different Purkinje cells. 

      Second, we analyze the influence of a BCM-like sliding threshold for plasticity on behavioral learning. Most previous evidence for the BCM model in vivo has derived from studies of the effects of sensory deprivation (e.g., monocular occlusion) on the functional connectivity of sensory circuits (Kirkwood et al., 1996; Desai et al. 2002; Fong et al., 2021) rather than on learning per se.  

      Third, our results provide evidence for major changes in the threshold for plasticity over short time scales and with more subtle manipulations of neural activity than used in previous studies, with practical implications for clinical application. Previously, metaplasticity has been demonstrated with sensory deprivation over multiple days (Kirkwood et al., 1996; Desai et al. 2002) or with drastic changes in neural activity, such as with TTX in the retina (Fong et al, 2021), TMS (Hamada et al 2008), or high frequency electrical stimulation in vitro (Holland & Wagner 1998; Montgomery & Madison 2002) or in vivo (Abraham et al 2001). In contrast, we provide evidence for metaplasticity induced by 30 min of behavioral manipulation (pre-training) and by the relatively subtle pharmacological manipulation of activity with systemic administration of diazepam, a drug approved for humans. Thus, our work contributes not only conceptually to understanding the function of threshold metaplasticity in vivo, but also offers practical observations that could pave the way for novel therapeutic interventions.  

      Fourth, whereas efforts to enhance plasticity and learning have largely focused on increasing the excitability of neurons during learning to help cross the threshold for plasticity (e.g., Albergaria et al., 2018; Yamaguchi et al., 2020; Le Friec et al., 2017), we take the opposite, somewhat counterintuitive approach of inhibiting the excitability of neurons during a period before learning to reset the threshold for plasticity to a state compatible with new learning. To our knowledge, the only other application of such an approach in an animal model of a brain disorder has been inhibiting peripheral (retinal) activity with TTX for treatment of amblyopia (Fong et al, 2021). Our findings from CNS inhibition with a single systemic dose of diazepam greatly expands the potential applications, which could readily be tested in other mouse models of human disorders, and other learning deficits. Even in cases where the specific synaptic impairments and circuitry are less fully understood, the impact of suppressing neural activity during a period before training to reduce the threshold for plasticity could be empirically tested.  

      Fifth, our work extends the consideration of a BCM-like sliding threshold for plasticity to the cerebellum, whereas previous work has focused on models and experimental studies of forebrain circuits. Currently there is a surge of interest in the contribution of the cerebellum to functions and brain disorders previously ascribed to forebrain, hence we anticipate broad interest in this work. 

      Sixth, our results suggest that the history of plasticity rather than the history of firing rates may be the homeostat controlling the threshold for plasticity, at least at the synapses under consideration. Diazepam pre-treatment only enhanced learning in the L7-Fmr1 KO mice with a low “baseline” threshold for plasticity, as measured in vitro, and not WT mice. This suggests it is not the neural activity per se that drives the change in threshold for plasticity, but the interaction of activity with the plasticity mechanism.

      In the revised Discussion, we make all of the above points, to make the implications more clear to readers.  

      The broad interest in this topic is illustrated by two concrete examples. First, an abstract of this work was honored with selection for oral presentation at the November 2023 Symposium of the Molecular and Cellular Cognition Society, a conceptually wide-ranging organization with thousands of members worldwide. Second, the most closely related published work on activity-dependent metaplasticity in vivo, the Fong et al 2021 eLife paper demonstrating reversal of amblyopia by suppression of activity in the retina by TTX, attracted such broad interest, not just of professional scientists, but also the general public, as to be reported on National Public Radio’s All Things Considered, with an audience of 11.9 million people worldwide.  

      In considering the potential of this work for widespread influence, it is important to note that activitydriven changes in the threshold for plasticity could very well be a general property of most if not all synapses, yet very little is known about its function in vivo, especially during learning.  Therefore, the seminal conceptual and practical advances described above have the potential for profound implications throughout neuroscience, psychiatry, neurology and computer science/AI, the eLife criterion for designation as “landmark” in significance. We respectfully request that the reviewers and editor reassess the significance of our findings in light of our much-improved discussion of the broad significance of the work.

      eLife assessment, Strength of support

      Convincing evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      The designation of “Convincing” indicates “methodology in line with current state-of the-art.” In the revised Discussion, we more clearly highlight that our evidence is “more rigorous than current state-ofthe-art” in several respects, thereby meeting the eLife criterion for “Compelling”:

      (1) Comparison of learning deficits and effects of behavioral and pharmacological pretreatment across five closely related oculomotor learning tasks, which all depend on the same region of the cerebellum (the flocculus), but which previous work has found to vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. 

      The “state-of-the-art” behavioral standard in the field of learning is assessment of a single learning task that depends on a given brain area, with the implicit or explicit assumption that the task chosen is representative of “cerebellum-dependent learning” or hippocampus-, amygdala-, basal ganglia-, cortex- dependent learning, etc. Sometimes there is a no-learning behavioral control. 

      Our study exceeds this standard by comparing across many different closely related learning tasks, which all depend on the cerebellar flocculus and other shared vestibular, visual, and oculomotor circuitry, but vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. In the original submission, we reported results for high-frequency VOR-increase learning that were dramatically different than for three other VOR learning tasks for which there is less evidence for a role of LTD. Reviewer 2 noted, “the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable.” In the revised manuscript, we provide new data for a second oculomotor learning task in which LTD has been implicated, OKR adaptation, with very similar results as for high-frequency VORincrease learning. The remarkable specificity of both the learning deficits and the effects of pre-training manipulations, in two different lines of mice, for the two specific learning tasks in which LTD has been most strongly implicated, and not the other three oculomotor learning tasks, substantially strengthens the evidence for the conclusion that the learning deficits and effects of pre-training are related specifically to the lower threshold for LTD, rather than the result of some other effect of the gene KO or pre-treatment on the cerebellar or oculomotor circuitry (discussed on lines 270-290 of revised manuscript). 

      (2) Replication of findings in more than one line of mice, targeting distinct signaling pathways, with a common impact of enhancing LTD at the cerebellar PF-Purkinje cell synapses.  

      State-of-the-art is to report the effects of one specific molecular signaling pathway on behavior. 

      In the first part of this Research Advance, we replicate the findings of Nguyen-Vu et al 2017 for a completely different line of mice with enhanced LTD at the parallel fiber-to-Purkinje cell synapses. Like the comparison across LTD-dependent and LTD-independent oculomotor learning tasks, the comparison across completely different lines of mice with enhanced LTD strengthens the evidence that the shared behavioral phenotypes are a reflection of the state of LTD rather than other “off-target” effects of each mutation (discussed on lines 291-309 of revised manuscript).

      (3) Reversal of learning impairments with more than one type of treatment. 

      State-of-the-art is to be able to reverse a learning deficit or other functional impairment in an animal model of a brain disorder with a single treatment; indeed, success in this respect is viewed as wildly exciting, as evidenced by the reception by the scientific and lay communities of the Fong et al, 2021 eLife report of reversal of amblyopia by TTX treatment of the retina. 

      In the current work, we demonstrate reversal of learning deficits with two different types of treatment during the period before training, one behavioral and one pharmacological. The current diazepam pretreatment results provide a fundamentally new type of evidence for the hypothesis that the threshold for LTD and LTD-dependent learning varies with the recent history of activity in the circuit, complementing the evidence from behavioral and optogenetic pre-training approaches used previously in Nguyen-Vu et al, 2017 (discussed on lines 151-158 and 246-255 of revised manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Shakhawat et al., investigated how enhancement of plasticity and impairment could result in the same behavioral phenotype. The authors tested the hypothesis that learning impairments result from saturation of plasticity mechanisms and had previously tested this hypothesis using mice lacking two class I major histocompatibility molecules. The current study extends this work by testing the saturation hypothesis in a Purkinje-cell (L7) specific Fmr1 knockout mouse mice, which have enhanced parallel fiber-Purkinje cell LTD. The authors found that L7-Fmr1 knockout mice are impaired on an oculomotor learning task and both pre-training, to reverse LTD, and diazepam, to suppress neural activity, eliminated the deficit when compared to controls.

      Strengths:

      This study tests the "saturation hypothesis" to understand plasticity in learning using a well-known behavior task, VOR, and an additional genetic mouse line with a cerebellar cell-specific target, L7-Fmr1 KO. This hypothesis is of interest to the community as it evokes a novel inquisition into LTD that has not been examined previously.

      Utilizing a cell-specific mouse line that has been previously used as a genetic model to study Fragile X syndrome is a unique way to study the role of Purkinje cells and the Fmr1 gene. This increases the understanding in the field in regards to Fragile X syndrome and LTD.

      The VOR task is a classic behavior task that is well understood, therefore using this metric is very reliable for testing new animal models and treatment strategies. The effects of pretraining are clearly robust and this analysis technique could be applied across different behavior data sets.

      The rescue shown using diazepam is very interesting as this is a therapeutic that could be used in clinical populations as it is already approved.

      There was a proper use of controls and all animal information was described. The statistical analysis and figures are clear and well describe the results.

      We thank the reviewer for summarizing the main strengths of our original submission. We have further strengthened the revised submission by 

      (1) more fully discussing the broad conceptual implications, as outlined above; 

      (2) adding additional new data (Fig. 5) showing that another LTD-dependent oculomotor learning task, optokinetic reflex (OKR) adaptation, is impaired in the L7-Fmr1 KO mice and rescued by pre-treatment with diazepam, as we had already shown for high-frequency VOR increase learning;  3) responding to the specific points raised by the reviewers, as detailed below.

      Weaknesses:

      While the proposed hypothesis is tested using genetic animal models and the VOR task, LTD itself is not measured. This study would have benefited from a direct analysis of LTD in the cerebellar cortex in the proposed circuits.

      Our current experiments were motivated by the direct analysis of cerebellar LTD in Fmr1 knock out mice that was already published (Koekkoek et al., 2005). In that previous work, LTD was analyzed in both Purkinje cell selective L7-Fmr1 KO mice (Koekkoek et al., 2005; Fig. 4D), as used in our study, and global Fmr1 knock out mice (Koekkoek et al., 2005; Fig. 4B). Both lines were found to have enhanced LTD, as cited in the Introduction of our manuscript (lines 48-51, 63-64). The goal of our current study was to build on this previous work by analyzing the behavioral correlates of the findings from this previous, direct analysis of LTD. 

      Diazepam was shown to rescue learning in L7-Fmr1 KO mice, but this drug is a benzodiazepine and can cause a physical dependence. While the concentrations used in this study were quite low and animals were dosed acutely, potential side-effects of the drug were not examined, including any possible withdrawal. 

      In humans, diazepam (valium) is one of the most frequently prescribed drugs in the world, and the side effects and withdrawal symptoms have been extensively studied and documented.1 Withdrawal symptoms are generally not observed with treatments of less than 2 weeks (Brett and Murnion, 2015). After longterm treatments tapering of the dosage is recommended to mitigate withdrawal (Brett and Murnion, 2015 and https://americanaddictioncenters.org/valium-treatment/withdrawal-duration). The extensive data on the safety of diazepam in humans lowers the barrier to potential clinical translation of our basic science findings, although we emphasize that our own expertise is scientific, and translation to Fragile X patients or other patient groups will require additional development of the research by clinicians.

      Given the extensive history of research on this drug, we focused on looking for side effects that would reflect an adverse effect of diazepam on the function of the same oculomotor neural circuitry whose ability to support certain oculomotor learning tasks was improved after diazepam. In other words, we assessed whether the pharmacological manipulation was enhancing certain functions of a given circuit at the expense of others. As we note (line 164), “The acute effect of diazepam administration [measured 2 hours after administration] was to impair learning” in both WT and L7-Fmr1 KO mice. One could consider this a side effect. More importantly, we also tested extensively for oculomotor side-effects during the therapeutic period when learning impairments were eliminated in the L7-Fmr1 KOs, 18-24 hours post-administration, and have a full section of the Results describing our findings about this, titled “Specificity of pre-training effects on learning.” As described in the Results and Discussion (lines 184195, 312-318, Figure 3, figure 3-supplement1; figure 4B; figure 5-supplement 1), we found no such adverse side-effects, which is again encouraging with respect to the translational potential of our findings. 

      This drug is not specific to Purkinje cells or cerebellar circuits, so the action of the drug on cerebellar circuitry is not well understood for the study presented.

      The effects of diazepam are indeed not specific to Purkinje cells, but rather are known to be widespread. Diazepam is a positive allosteric modulator of GABAA receptors, which are found throughout the brain, including the cerebellum. When delivered systemically, as we did in our experiments, diazepam will suppress neural activity throughout the brain by facilitating inhibition, as documented by decades of previous research with this and related benzodiazepines, including dozens of studies of the effects of diazepam in the cerebellum. 

      To our knowledge, there is currently no drug that can specifically inhibit Purkinje cells, especially one that can be given systemically to cross the blood-brain barrier. Moreover, if such a drug did exist, we would not predict it to have the same effect as diazepam in reversing the learning deficits of the L7-Fmr1 KO mice, because the latter presumably depends on suppression of activity in the cerebellar granule cells and neurons of the inferior olive, whose axons form the parallel fibers and climbing fibers, and whose correlated activity controls LTD at the parallel fiber-Purkinje cell synapses.  

      We have revised the text to clarify the key point that despite its widespread action on the brain, the effects of diazepam on cerebellum-dependent learning were remarkably specific (lines 184-195, 210-228, 312318). During the period 18-24 hours after a single dose of diazepam, the learning deficits of L7-Fmr1 KO mice on two LTD-dependent oculomotor learning tasks were completely reversed, with no effects on the same tasks in WT mice, and no effects (“side-effects”) in L7-Fmr1 KO mice or WT mice on other, LTDindependent oculomotor learning tasks that depend on the same region of the cerebellum, and no effects on baseline performance of visually or vestibularly driven eye movements. 

      As described in the revised Discussion (lines 318-323), the non-specific mild suppression of neural activity throughout the brain by diazepam makes it a potentially generalizable approach for inducing BCM-like shifts in the threshold for associative plasticity to facilitate subsequent learning. More specifically, diazepam-mediated reduction of activity throughout the brain has the potential to lower any aberrantly high thresholds for associative plasticity at synapses throughout the brain, and thereby reverse any learning deficits associated with such aberrantly high plasticity thresholds. This approach might even be useful in cases where the neural circuitry supporting a given behavior is not well characterized and the specific synapses responsible for the learning deficit are unknown. On lines 323-327 we compare this generalizable approach with the challenges of designing task- and circuit-specific approaches to reset the threshold for plasticity, particularly in circuits that are less well characterized than the oculomotor circuit.

      It was not mentioned if L7-Fmr1 KO mice have behavior impairments that worsen with age or if Purkinje cells and the cerebellar microcircuit are intact throughout the lifespan. 

      At the adult ages used in our study (8-22 weeks), the oculomotor circuitry, including the Fmr1-deficient Purkinje cells, appears to be functionally intact because all of the oculomotor performance and learning tasks we tested were either normal, or could be restored to normal with brief behavioral and/or pharmacological pre-treatment.  

      Any degeneration of the Fmr1-deficient Purkinje cells or cerebellar microcircuit or additional behavioral impairments at older ages, if they should exist, would not alter our interpretation of the results from 8-22 week old adults regarding history- and activity-dependent changes in the capacity for LTD-dependent learning. Therefore, we leave the question of changes throughout the lifespan to investigators with an interest and expertise in development and/or aging. 

      Only a small handful of the scores of previous studies of the Fmr1 KO mouse model have investigated age-dependent effects; the reviewer may be interested in papers such as Tang et al., 2015 (doi: 10.1073/pnas.1502258112) or Martin et al., 2016 (doi: 10.1093/cercor/bhv031). 

      Connections between Purkinje cells and interneurons could also influence the behavior results found.

      This comment is repeated below in a more general form (Reviewer 1, second to last comment)—please see our response there and lines 270-309 of the revised manuscript for a discussion of how concerns about “off-target” effects are mitigated by the high degree of specificity of the learning deficits and effects of pre-training for the specific learning tasks in which LTD has been previously implicated, and the very similar findings in two different lines of mice with enhanced LTD.

      While males and females were both used for the current study, only 7 of each sex were analyzed, which could be underpowered. While it might be justified to combine sexes for this particular study, it would be worth understanding this model in more detail.

      We performed additional analyses to address the question of whether there might be sex differences that were not detected because of the sample size.

      (1) In a new figure, Fig. 1-figure supplement 1, we break out the results for male and female mice in separate plots, and show that all of the effects of both the KO of Fmr1 from the Purkinje cells and of pretreatment with diazepam that are observed in the full cohort are also statistically significant in just the subset of male mice, and just the subset of female mice (see Fig. 1-figure supplement 1 legend for statistics). In other words, qualitatively, there are no sex differences, and all of the conclusions of our manuscript are statistically valid in both male and female mice. This strengthens the justification for combining sexes for the specific scientific purposes of our study.  

      (2) We performed a power analysis to determine how many mice would be needed to determine whether the very, very small quantitative differences between male and female mice are significant. The analysis indicates that this would require upwards of 70 mice of each sex for WT mice (Cohen’s d, 0.6162; power

      0.95) and upwards of 2500 mice of each sex for L7-Fmr1 KO mice (Cohen’s d, 0.0989; power 0.95). Since the very small quantitative sex differences observed in our cohorts would not alter our scientific conclusions or the possibility for clinical application to patients of both sexes, even if the small quantitative differences turned out to be significant, the very large number of animals needed did not seem warranted for the current scientific purposes. Researchers focused on sex differences may find a motivation to pursue this issue further.   

      Training was only shown up to 30 minutes and learning did not seem to plateau in most cases. What would happen if training continued beyond the 30 minutes? Would L7-Fmr1 KO mice catch-up to WT littermates? Nguyen-Vu

      (1) For VOR learning, we used a 30 min training time because in our past (e.g., Boyden et al., 2003; Kimpo and Raymond, 2007; Nguyen-Vu et al., 2013; Nguyen-Vu et al., 2017) and current results, we find that VOR learning does plateau quite rapidly, with little or no additional adaptive change in the VOR observed between the tests of learning after 30 min vs 20 min of VOR-increase training, in WT or L7Fmr1 KO mice (Fig. 1A; WT, p=0.917; L7-Fmr1 KO, p=0.861; 20 vs. 30 min; Tukey). In the L7-Fmr1 KO mice, there is no significant high-frequency VOR-increase learning after 30 min training, and the mean VOR gain is even slightly lower on average (not significant) than before training (Fig. 1A, red). Therefore, we have no reason to expect that the L7-Fmr1 KO mice would catch up to WT after additional VOR-increase training.  

      (2) We have added new data on OKR adaptation, induced with 60 min of training (Fig. 5). The L7-Fmr1 KO mice exhibited impaired OKR adaptation, even with 60 min of training (p= 1.27x10-4, Tukey). In our experience, restraint for longer than 60 min produces a behavioral state that is not conducive to learning, as also reported by (Katoh and Yamagiwa, 2018), therefore longer training times were not attempted. 

      The pathway discussed as the main focus for VOR in this learning paradigm was connections between parallel fibers (PF) and Purkinje cells, but the possibility of other local or downstream circuitry being involved was not discussed. PF-Purkinje cell circuits were not directly analyzed, which makes this claim difficult to assess.

      In the revised manuscript (lines 299-309), we have expanded our discussion of the possibility that loss of expression of Fmr1 from Purkinje cells in the Purkinje cell-specific L7-Fmr1 KO mice might influence other synapses or intrinsic properties of the Purkinje cells (including synapses from interneurons, as raised in this reviewer’s comment above), in addition to enhancing associative LTD at the parallel fiberPurkinje cell synapses. 

      It is a very general limitation of all perturbation studies, even cell-type specific perturbation studies as in the current case, that it is never possible to completely rule out “off-target” effects of the manipulation. Because of this, causality cannot be definitively concluded from correlations (e.g., between the effects of a perturbation observed at the cellular and behavioral level), and therefore we make no such claim in our manuscript. Rather, we conclude that our results “provide evidence for,” “support,” “predict,” or “are consistent with” the hypothesis of a history- and activity-dependent change in the threshold for associative LTD at the parallel fiber-Purkinje cells.

      That said, perturbation is still one of the major tools in the experimental toolbox, and there are approaches for mitigating concern about off-target effects. We highlight three aspects of our experimental design that accomplish this (lines 184-228, 256-309). First, we show nearly identical learning impairments and effects of behavioral pretreatment in lines of mice with two completely different molecular manipulations that have the common effect of enhancing PF-Purkinje cell LTD, but are likely to have different off-target cellular effects on the Purkinje cells and their synapses. Second, we show that the learning impairments were highly specific to oculomotor learning tasks in which PF-Purkinje cell LTD was previously implicated, with no such effects on three other oculomotor learning tasks that depend on the same region of the cerebellum and oculomotor circuitry. In the original submission, we provided data for one LTDdependent oculomotor learning task, high-frequency VOR-increase learning; in the revised manuscript we provide new data for a second LTD-dependent oculomotor learning task, optokinetic reflex adaptation, with nearly identical results (Fig. 5). Third, we show that the effects of diazepam pre-treatment were highly specific to the same two LTD-dependent oculomotor learning tasks and also highly specific to the L7-Fmr1 KO mice with enhanced LTD and not WT mice. These three features of the experimental design are not common in studies of learning, especially in combination. On lines 256-309, we provide an expanded discussion of how together, these three features of the design strengthen the evidence that the learning impairments and effects of diazepam pre-treatment on learning are related to LTD at the PF-Pk synapses, while acknowledging the possibility of other effects on the circuit. 

      The authors mostly achieved their aim and the results support their conclusion and proposed hypothesis. This work will be impactful on the field as it uses a new Purkinje-cell specific mouse model to study a classic cerebellar task. The use of diazepam could be further analyzed in other genetic models of neurodevelopmental disorders to understand if effects on LTD can rescue other pathways and behavior outcomes.

      We agree that the present findings are potentially relevant for a very wide array of behavioral tasks, disease models, and brain areas beyond the specific ones in our study, and we make this point on lines 310-338 of the revised manuscript. 

      Reviewer #2 (Public Review):

      This manuscript explores the seemingly paradoxical observation that enhanced synaptic plasticity impairs (rather than enhances) certain forms of learning and memory. The central hypothesis is that such impairments arise due to saturation of synaptic plasticity, such that the synaptic plasticity required for learning can no longer be induced. A prior study provided evidence for this hypothesis using transgenic mice that lack major histocompatibility class 1 molecules and show enhanced long-term depression (LTD) at synapses between granule cells and Purkinje cells of the cerebellum. The study found that a form of LTD-dependent motor learning-increasing the gain of the vestibulo-ocular reflex (VOR)-is impaired in these mice and can be rescued by manipulations designed to "unsaturate" LTD. The present study extends this line of investigation to another transgenic mouse line with enhanced LTD, namely, mice with the Fragile X gene knocked out. The main findings are that VOR gain increased learning is selectively impaired in these mice but can be rescued by specific manipulations of visuomotor experience known to reverse cerebellar LTD. Additionally, the authors show that a transient global enhancement of neuronal inhibition also selectively rescues gain increases learning. This latter finding has potential clinical relevance since the drug used to boost inhibition, diazepam, is FDA-approved and commonly used in the clinic. The evidence provided for the saturation is somewhat indirect because directly measuring synaptic strength in vivo is technically difficult. Nevertheless, the experimental results are solid. In particular, the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable. The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this exceptionally clear and concise assessment of the findings and strengths of the manuscript.

      We agree that one of the most “remarkable” aspects of our findings is the specificity of the effects for oculomotor learning tasks for which there is the strongest previous evidence for a role of PF-Purkinje cell LTD. In the original manuscript, we tested just one LTD-dependent oculomotor learning task, highfrequency VOR increase learning; in the revised manuscript, we strengthen the case for LTD-dependent task specificity by adding new data (Fig. 5) showing the same effects for OKR adaptation, an additional LTD-dependent oculomotor learning task.

      The reviewer’s suggestion to include discussion of “untested assumptions”, “including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation” prompted us to more deeply consider the broader implications of our results, and extensively revise the Discussion accordingly. We clarify that we consider historydependent changes in the threshold for LTD to be a prediction of the behavioral and pharmacological findings (lines 339-347, 356) rather than an assumption. In addition, we highlight the broader implications of the results by putting them in the context of work in other brain areas on historydependent changes in the threshold for plasticity, i.e., metaplasticity, going back to the seminal Bienenstock-Cooper-Munro (BCM; year) theory (lines 348-378).  

      Reviewer #1 (Recommendations for The Authors):

      The text and figures are very clear to read, but there are a couple of questions that remain:

      The concentrations chosen for diazepam are not well described and it is unclear why the concentrations jump from 2.5 mg/kg to 0.5 mg/kg. Please add an explanation for these concentrations and if any additional behavior outcomes were observed.

      Our choice of diazepam concentrations was guided by the concentrations reported in the literature to be effective in mice, which suggest that a higher dose (2 mg/kg) can have additional effects not observed with a lower effective dose (0.5 mg/kg) (Pádua-Reis et al, 2021). Since we did not know how much enhancement of inhibition/suppression of activity might be necessary to substantially reduce the induction of PF-Purkinje cell LTD, we did pilot experiments to test concentrations at the low and high ends of the doses typically used in mice. These pilot experiments revealed that a lower dose of 0.4 or 0.5 mg/kg was comparable to the higher dose of 2.5 mg/kg in suppressing VOR-increase learning 2 hours after administration (Fig. 3 – figure supplement 2). Anecdotally, we observed higher levels of locomotor activity and other abnormal cage behavior during the period immediately after administration of the higher compared to the lower dose. To limit these side effects and any possibility of dependence, we used only the lower dose in all subsequent experiments. We clarify this rationale for using a lower dose in the legend of Fig. 3 – figure supplement 2.   

      Figure 4 describes low-frequency VOR, but the paragraph discussing these results (line 191) mentions high-frequency VOR-increase learning. It is unclear where the results are for the high-frequency data. Please include or rephrase for clearer understanding.

      In the revised manuscript, we clarify that the 1 Hz vestibular and visual stimuli used in Figs. 1-3 is the

      “high” frequency, which yields different results than the “low” frequency of 0.5 Hz (Fig. 4), as also observed in Boyden et al 2006, and Nguyen-Vu et al, 2017. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this comment, which, along with your public comments, inspired us to thoroughly reconsider and revise our Discussion. We think this has greatly improved the manuscript, and will substantially increase its appeal to a broad segment of the neuroscience research community, including computational neuroscientists as well as those interested in synaptic physiology, learning and memory, or plasticity-related brain disorders including autism. 

      Note that we consider the idea that ”LTD depends not only on pre- and post- synaptic activity but also on the prior history of synaptic activation” to be the central prediction of the threshold metaplasticity hypothesis rather than an assumption, and in the revised manuscript we explicitly refer to this as a prediction (line 339, 356).  We also added a discussion of multiple known cellular phenomena in the Purkinje cells and their synapses that can regulate LTD and thus represent candidate mechanisms for LTD threshold metaplasticity (lines 339-347). Again, sincere thanks for prompting us to write a vastly improved Discussion section.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported in the main text for all key questions and not only when the p-value is less than 0.05.

      We have added exact p-values throughout the manuscript.  

      References

      Albergaria C, Silva NT, Pritchett DL, Carey MR. (2018). Locomotor activity modulates associative learning in mouse cerebellum. Nat Neurosci.21:725-735. doi: 10.1038/s41593-018-0129-x.

      Abraham WC, Mason-Parker SE, Bear MF, Tate WT. (2001). Heterosynaptic metaplasticity in the hippocampus in vivo: A BCM-like modifiable threshold for LTP. Proc Natl Acad Sci USA. 98:1092410929.

      Bienenstock E, Cooper L, Munro P. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 2:32-48. https://doi.org/10.1523/JNEUROSCI.02-01-00032.1982

      Brett J, Murnion B. (2015). Management of benzodiazepine misuse and dependence. Aust Prescr.38:152155. doi: 10.18773/austprescr.055.

      Boyden ES, Raymond JL. (2003). Active Reversal of Motor Memories Reveals Rules Governing Memory Encoding. Neuron.39:1031-1042. https://doi.org/10.1016/S0896-6273(03)00562-2

      Boyden ES, Katoh A, Pyle JL, Chatila TA, Tsien RW, Raymond JL. (2006). Selective engagement of plasticity mechanisms for motor memory storage. Neuron. 51:823-834. https://doi.org/10.1016/j.neuron.2006.08.026

      Desai NS, Cudmore RH, Nelson SB, Turrigiano GG. (2002). Critical periods for experience-dependent synaptic scaling in visual cortex. Nat Neurosci. 5:783-789. doi: 10.1038/nn878.

      Fong M, Duffy KR, Leet MP, Candler CT, Bear MF. (2021). Correction of amblyopia in cats and mice after the critical period. ELife.10:e70023. https://doi.org/10.7554/eLife.70023

      Hamada M, Terao Y, Hanajima R, Shirota Y, Nakatani-Enomoto S, Furubayashi T, Matsumoto H, Ugawa Y. (2008). Bidirectional long-term motor cortical plasticity and metaplasticity induced by quadripulse transcranial magnetic stimulation. J Physiol. 586:3927-3947. doi: 10.1113/jphysiol.2008.152793.

      Katoh A, Yamagiwa A. (2018). Inhibition of PVN neurons influences stress-induced changes of motor learning in the VOR. Society for Neuroscience. Online Program No. 067.14.

      Kimpo RR, Raymond JL. (2007). Impaired motor learning in the vestibulo-ocular reflex in mice with multiple climbing fiber input to cerebellar Purkinje cells. J Neurosci. 27:5672-5682. doi:

      10.1523/JNEUROSCI.0801-07.2007.

      Kirkwood A, Rioult MG, Bear MF. (1996). Experience-dependent modification of synaptic plasticity in visual cortex. Nature. 381:526–528. https://doi.org/10.1038/381526a0

      Koekkoek SK, Yamaguchi K, Milojkovic BA, Dortland BR, Ruigrok TJ, Maex R, De Graaf W, Smit AE, VanderWerf F, Bakker CE, Willemsen R, Ikeda T, Kakizawa S, Onodera K, Nelson DL, Mientjes E, Joosten M, De Schutter E, Oostra BA, Ito M, De Zeeuw CI. (2005). Deletion of FMR1 in Purkinje Cells Enhances Parallel Fiber LTD, Enlarges Spines, and Attenuates Cerebellar Eyelid Conditioning in Fragile X Syndrome. Neuron. 47:339–352. https://doi.org/10.1016/j.neuron.2005.07.005

      Le Friec A, Salabert AS, Davoust C, Demain B, Vieu C, Vaysse L, Payoux P, Loubinoux I. (2017). Enhancing Plasticity of the Central Nervous System: Drugs, Stem Cell Therapy, and Neuro-Implants. Neural Plast. 2017:2545736. doi: 10.1155/2017/2545736.

      Leet MP, Bear MF, Gaier ED. (2022). Metaplasticity: a key to visual recovery from amblyopia in adulthood? Curr Opin Ophthalmol. 33:512–518. https://doi.org/10.1097/ICU.0000000000000901

      Martin HGS, Lassalle O, Brown JT, Manzoni OJ. (2016). Age-Dependent Long-Term Potentiation Deficits in the Prefrontal Cortex of the Fmr1 Knockout Mouse Model of Fragile X Syndrome. Cereb Cortex. 26:2084–2092. doi: 10.1093/cercor/bhv031.

      Montgomery JM, Madison DV. (2002). State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron. 33:765-777. doi: 10.1016/s0896-6273(02)00606-2.

      Nguyen-Vu TDB, Kimpo RR, Rinaldi JM, Kohli A, Zeng H, Deisseroth K, Raymond JL. (2013). Cerebellar Purkinje cell activity drives motor learning. Nat Neurosci. 16:1734-1736. doi:

      10.1038/nn.3576.

      Nguyen-Vu TB, Zhao GQ, Lahiri S, Kimpo RR, Lee H, Ganguli S, Shatz CJ, Raymond JL. (2017). A saturation hypothesis to explain both enhanced and impaired learning with enhanced plasticity. ELife. 6:e20147. https://doi.org/10.7554/eLife.20147

      Pádua-Reis M, Nôga DA, Tort ABL, Blunder M. (2021). Diazepam causes sedative rather than anxiolytic effects in C57BL/6J mice. Sci Rep. 2021;11:9335.

      Singh A, Nagpal R, Mittal SK, Bahuguna C, Kumar P. (2017). Pharmacological therapy for amblyopia. Taiwan J Ophthalmol. 7:62-69. doi: 10.4103/tjo.tjo_8_17.

      Tang B, Wang T, Wan H, Han L, Qin X, Zhang Y, Wang J, Yu C, Berton F, Francesconi W, Yates JR 3rd, Vanderklish PW, Liao L. (2015). Fmr1 deficiency promotes age-dependent alterations in the cortical synaptic proteome. Proc Natl Acad Sci USA. 112:E4697-E4706. doi: 10.1073/pnas.1502258112.

      Yamaguchi T, Moriya K, Tanabe S, Kondo K, Otaka Y, Tanaka S. (2020). Transcranial direct-current stimulation combined with attention increases cortical excitability and improves motor learning in healthy volunteers. J Neuroeng Rehabil. 17:23. doi: 10.1186/s12984-020-00665-7.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable work performed fMRI experiments in a rodent model of absence seizures. The results provide new information regarding the brain's responsiveness to environmental stimuli during absence seizures. The authors suggest reduced responsiveness occurs during this type of seizure, and the evidence leading to the conclusion is solid, although reviewers had divergent opinions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible affect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion. I do not see that this issue was addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time. This issue was not addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data. This is still an issue. No conclusions appear to be possible to make.

      See comments below starting with “We acknowledge the reviewer…”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The authors did not add any validation of their model.

      See comments below starting with “We acknowledge the reviewer…”.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results were improved but some are still are unclear.

      We acknowledge the reviewer for the concerns of we not addressing the comments above. However, we emphasize that most of the comments were addressed in the already sent “Response to Review Comments” and in the updated manuscript. Here we repeat the responses and provide also additional clarifications to some of the comments.

      We thank the reviewer for noting the discrepancy in the statement of “less activated in interictal state”. The statement should have been written vice versa. We also address that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made a following changes in the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”. The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression (possible apparent in negative HRFs) caused by SWD can prevent responsiveness. Conclusion now states the following: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response.”

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. But the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with potential to yield important insights.

      Use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and tactile stimuli were 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. But the attempts to localize these differences in space or time will be contaminated by the seizure related signals.

      In their response to this comment the authors state that some seizures had longer than average duration, and that they attempted to model the effects of both seizures and sensory stimulation. However these factors do not mitigate the concern because the mean duration of seizures and sensory stimulation remain nearly identical, and the models used therefore will not be able to effectively separate signals related to seizures and related to sensory stimulation.

      Regressors for seizures were formed by including periods of seizures without any stimulation present. In theory, if seizures were perfectly modeled by the regressor, the left variance is completely orthogonal to the main effect of the stimulus. Furthermore, only the cases where the seizures are longer than the stimulus are used to calculate the responsiveness of the stimulus (while the cases where the seizures are shorter than the stimulus are used as nuisance regressors to account for error variance). However, we agree with the reviewer that in practice all effects of the seizure cannot be removed completely from the effect of stimulus. We have addressed this concern in the “physiologic and methodology consideration” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain a mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination in the linear model used, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be separated as much as possible from the effects caused by the seizure itself.”

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs interictal remain unconvincing due to above.

      Maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      In their response the authors enlarged the cross sections. However there are still discrepancies between the images and the way they are described in the text. For example, in the Results text the authors say that comparing the interictal and ictal states revealed less activation in the somatosensory cortex during the ictal than during the interictal state, yet Figure 3 bottom row left shows greater activation in somatosensory cortex in this contrast.

      We note that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made the following changes to the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Authors have revised this paper with a lot of detail. The paper can be accepted for publication in this version.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #1

      (1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: "The mechanism underlying the reduced responsiveness to external stimulus remains unknown." was therefore modified to the following "The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown".

      This change did not address the issue. The problem is that there is no experimentation to address the underlying mechanisms of the results. I also think the changed language in the abstract is less clear than the original.

      We fully agree that this manuscript does not answer or claim to be answering about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states, by means of hemodynamics and mean-field simulation.

      We have changed the language of the abstract to the following:

      “In patients suffering absence epilepsy, recurring seizures can significantly decrease their quality of life and lead to yet untreatable comorbidities. Absence seizures are characterized by spike-and-wave discharges on the electroencephalogram associated with a transient alteration of consciousness. However, it is still unknown how the brain responds to external stimuli during and outside of seizures.

      This study aimed to investigate responsiveness to visual and somatosensory stimulation in GAERS, a well-established rat model for absence epilepsy. Animals were maintained in a non-curarized awake state allowing for naturally occurring seizures to be produced inside the magnet. They were imaged continuously using a quiet zero-echo-time functional magnetic resonance imaging (fMRI) sequence. Sensory stimulations were applied during interictal and ictal periods. Whole brain responsiveness and hemodynamic responses were compared between these two states. Additionally, a mean-field simulation model was used to mechanistically explain the changes of neural responsiveness to visual stimulation between interictal and ictal states.

      Results showed that, during a seizure, whole-brain responses to both sensory stimulations were suppressed and spatially hindered. In several cortical regions, hemodynamic responses were negatively polarized during seizures, despite the application of a stimulus. The simulation experiments also showed restricted propagation of spontaneous activity due to stimulation and so agreed well with fMRI findings. These results suggest that sensory processing observed during an interictal state is hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness during this absence epileptic process.”

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      The response of the authors did not clarify this issue. Instead, they explained why they examined HRF and that they can only speculate what the data means.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      This is not entirely satisfactory because there is still no validation of the model.

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following:<br /> "Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps."

      This is helpful, but the unstained brain does not show the borders of the areas. Therefore just saying an atlas was used is not enough. How in an unstained brain can the areas be accurately outlined?

      Areas of the brain were differentiated by co-registering the functional MRI images with an T1-weighted anatomical reference brain that was created on site from the same data set that was used for the manuscript. Potential co-registration inaccuracies created by using a reference brain measured in different site, sequence and a rat strain can be thus avoided. T1-images create sufficient contrast to differentiate main brain areas, but for more accurate border definition (e.g., to differentiate different thalamic nuclei), a coordinate system of the atlas and coordinates known in the used anatomical brain, were used to pinpoint exact borders of the brain areas.

      Reviewer #2

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)."

      What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Please check that these issues were corrected.

      The issues were addressed as follows:

      “Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al., 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      This is not the question. The question is whether there was sufficient power.

      A simple power calculation was performed as follows: considering a t-test, a risk alpha of 0.05, a power of 0.8, matched pairs (seizure/control), we can detect an effect size of 0.37 with our 4 animals, considering repeated measurements (4 sessions/animal x 11 seizure/control pairs per session). This is now mentioned in the manuscript.

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Table 1 could be improved by statistics. More could be said and there would be justification to include it.

      We thank the reviewer for the suggestion, but as it is yet unclear to what statistical comparison would be feasible to do, we opt to leave it out.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      This section is not clear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themselves with the concept of statistical parametric mapping.

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      This section is unclear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themself with the concept of statistical parametric mapping.

      Reviewer #3 (Recommendations For The Authors):

      Aside from the concerns listed as weaknesses above which were not addressed, most of the more minor comments were addressed by the authors in the resubmission. However, the comment below was not addressed because it is impossible to see any firing rate changes elicited by sensory stimuli (if they are present) due to the scale during seizures. The seizure signals should be removed or accounted for by the model so that any possible sensory stimulus-related signals could be seen, and displayed on the same scale as firing rates without seizures. Prior comment (unaddressed) is repeated below:

      Figure 6-figure supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest, and perhaps should be subtracted out.

      These two comments were addressed and replied in the previous round of reviews. Regarding the different scales of the plots from Figure 6-figure supplement 1, we point out that all the plots in the same scale are already presented in Figure 6 of the main-text. Regarding the activity related to SWD and sensory stimulation, we remark that the effect of the stimulation should be (and was) evaluated with respect to the ongoing activity. All the results concerning the neuronal responsiveness presented in the paper evaluate the statistical significance of the changes in activity produced by the stimulation with respect to the ongoing activity (during ictal and interictal states respectively). For this reason, all the plots containing the time series of neuronal activity in the simulations include the ongoing activity (with SWD dynamics when present) for proper comparison and relevant analysis. 

      Additional changes:

      In the section 3.2., the sentence: “In addition, responses were observed in the somatosensory cortex during a seizure state.” was removed for clarification purposes as deactivation rather than activation was observed in this brain area during a seizure state.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) It is not clear about the biological significance of the inhibitory effects of human Abeta42 on gammasecretase activity. As the authors mentioned in the Discussion, it is plausible that Abeta42 may concentrate up to microM level in endosomes. However, subsets of FAD mutations in APP and presenilin 1 and 2 increase Abeta42/Abeta40 ratio and lead to Abeta42 deposition in brain. APP knock-in mice NLF and NLGF also develop Abeta42 deposition in age-dependent manner, although they produce more human Abeta42 than human Abeta40. 

      If the production of Abeta42 is attenuated, which results in less Abeta42 deposition in brain. So, it is unlikely that human Abeta42 interferes gamma-secretase activity in physiological conditions. This reviewer has an impression that inhibition of gamma-secretase by human Abeta42 is an interesting artifact in high Abeta42 concentration. If the authors disagree with this reviewer's comment, this manuscript needs more discussion in this point of view. 

      We thank the Reviewer for raising this key conceptual point, we acknowledge that it was insufficiently discussed in the original manuscript. In response to this point, we introduced the following paragraph in the discussion section of the revised manuscript:

      “From a mechanistic standpoint, the competitive nature of the Aβ42-mediated inhibition implies

      that it is partial, reversible, and regulated by the relative concentrations of the Aβ42 peptide (inhibitor) and the endogenous substrates (Figure 10C and 10D). The model that we put forward is that cellular uptake, as well as endosomal production of Aβ, result in increased intracellular concentration of Aβ42, facilitating γ-secretase inhibition and leading to the buildup of APP-CTFs (and γ-secretase substrates in general). As Aβ42 levels fall, the augmented concentration of substrates shifts the equilibrium towards their processing and subsequent Aβ production. As Aβ42 levels rise again, the equilibrium is shifted back towards inhibition. This cyclic inhibitory mechanism will translate into pulses of (partial) γsecretase inhibition, which will alter γ-secretase mediated-signaling (arising from increased CTF levels at the membrane or decreased release of soluble intracellular domains from substrates). These alterations may affect the dynamics of systems oscillating in the brain, such as NOTCH signaling, implicated in memory formation, and potentially others (related to e.g. cadherins, p75 or neuregulins). It is worth noting that oscillations in γ-secretase activity induced by treatment with a γ-secretase inhibitor semagacestat have been proposed to have contributed to the cognitive alterations observed in semagacestat treated patients in the failed Phase-3 IDENTITY clinical trial (7) and that semagacestat, like Aβ42, acts as a high affinity competitor of substrates (85).

      The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γsecretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis.“

      We have also added figures 10C and 10D, presented here for convenience.

      Author response image 1.

      (2) It is not clear whether the FRET-based assay in living cells really reflects gamma-secretase activity.

      This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta. 

      We have established a novel, HiBiT tag based assay reporting on the global γ-secretase activity in cells, using as a proxy the total levels of secreted HiBiT-tagged Aβ peptides. The assay and findings are presented in the revised manuscript as follows:

      In the result section, in the “Aβ42 treatment leads to the accumulation of APP C-terminal fragments in neuronal cell lines and human neuron” subsection:

      “The increments in the APP-CTF/FL ratio suggested that Aβ42 (partially) inhibits the global γ-

      secretase activity. To further investigate this, we measured the direct products of the γ-secretase mediated proteolysis of APP. Since the detection of the endogenous Aβ products via standard ELISA methods was precluded by the presence of exogenous human Aβ42 (treatment), we used an N-terminally tagged version of APPC99 and quantified the amount of total secreted Aβ, which is a proxy for the global γsecretase activity. Briefly, we overexpressed human APPC99 N-terminally tagged with a short 11 amino acid long HiBiT tag in human embryonic kidney (HEK) cells, treated these cultures with human Aβ42 or p3 17-42 peptides at 1 μM or DAPT (GSI) at 10 µM, and determined total HiBiT-Aβ levels in conditioned media (CM). DAPT was considered to result in full γ-secretase inhibition, and hence the values recorded in DAPT treated conditions were used for the background subtraction. We found a ~50% reduction in luminescence signal, directly linked to HiBiT-Aβ levels, in CM of cells treated with human Aβ42 and no effect of p3 peptide treatment, relative to the DMSO control (Figure 3D). The observed reduction in the total Aβ products is consistent with the partial inhibition of γ -secretase by Aβ42.”

      In Methods:

      “Analysis of γ-secretase substrate proteolysis in cultured cells using secreted HiBiT-Aβ or -Aβ-like peptide levels as a proxy for the global γ-secretase endopeptidase activity

      HEK293 stably expressing APP-CTF (C99) or a NOTCH1-based substrate (similar in size as

      APP- C99) both N-terminally tagged with the HiBiT tag were plated at the density of 10000 cells per 96-well, and 24h after plating treated with Aβ or p3 peptides diluted in OPTIMEM (Thermo Fisher Scientific) supplemented with 5% FBS (Gibco). Conditioned media was collected and subjected to analysis using Nano-Glo® HiBiT Extracellular Detection System (Promega). Briefly, 50 µl of the medium was mixed with 50 µl of the reaction mixture containing LgBiT Protein (1:100) and Nano-Glo HiBiT Extracellular Substrate (1:50) in Nano-Glo HiBiT Extracellular Buffer, and the reaction was incubated for 10 minutes at room temperature. Luminescence signal corresponding to the amount of the extracellular HiBiT-Aβ or -Aβ-like peptides was measured using victor plate reader with default luminescence measurement settings.”

      As the direct substrate of γ -secretase was used in this analysis, the observed reduction (~50%) in the levels of N-terminally-tagged (HiBiT) Aβ peptides in the presence of 1 µM Aβ42, relative to control conditions, demonstrates a selective inhibition of γ-secretase by Aβ42 (not by the p3). These data complement the FRET-based findings presented in Figure 5.

      (3) Processing of APP-CTF in living cells is not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of Abeta in Figures 4, 5 and 7.

      We tried to measure the levels of Aβ peptides secreted by cells into the culture medium directly by ELISA (using different protocols) or MS (using established methods, as reported in Koch et al, 2023), but exogenous Aβ42 (treatment) present at relatively high levels interfered with the readout and rendered the analysis inconclusive. 

      However, we were successful in the determination of total secreted (HiBiT-tagged) Aβ peptides from the HiBiT tagged APP-C99 substrate, as indicated in the previous point. The quantification of the levels of these peptides showed that Aβ42 treatment resulted in ~50% reduction in the γ -secretase mediated processing of the tagged substrate.    

      In addition, we would like to highlight that our analysis of the contribution of other APP-CTF degradation pathways, using cycloheximide-based assays in the constant presence of γ-secretase inhibitor, failed to reveal significant differences between Aβ42 treated cells and controls (Figure 6B & C). The lack of a significant impact of Aβ42 on the half-life of APP-CTFs under the conditions of γsecretase inhibition maintained by inhibitor treatment is consistent with the proposed Aβ42-mediated inhibitory mechanism.

      (4) Similar to comment #3. Processing of Pancad-CTF and p75 in living cells may be not only the cleavage by gamma-secretase. This reviewer thinks that the authors need at least biochemical data, such as levels of ICDs in Figures 6C and E. 

      To address this comment we have now performed additional experiments where we measured Nterminal Aβ-like peptides derived from NOTCH1-based substrate using the HiBiT-based assay. These experiments showed a reduction in the aforementioned peptides in the cells treated with Aβ42 relative to the vehicle control, and hence further confirmed the inhibitory action of Aβ42. These new data have been included as Figure 8D in the revised manuscript and described as follow:

      Finally, we measured the direct N-terminal products generated by γ-secretase proteolysis from a HiBiT-tagged NOTCH1-based substrate, an estimate of the global γ-secretase activity. We quantified the Aβ-like peptides secreted by HEK 293 cells stably expressing this HiBiT-tagged substrate upon treatment with 1 µM Aβ1-42,  p3 17-42 peptide or  DAPT (GSI) (Figure 8D). DAPT treatment was considered to result in a complete γ-secretase inhibition, and hence the values recorded in the DAPT condition were used for background subtraction. A ~20% significant reduction in the amount of secreted

      N-terminal HiBiT-tagged peptides derived from the NOTCH1-based substrates in cells treated with Aβ1-

      42 supports the inhibitory action of Aβ1-42 on γ-secretase mediated proteolysis.

      Minor concerns:

      (1) Murine Abeta42 may be converted to murine Abeta38 easily, compared to human Abeta42. This may be a reason why murine Abeta42 exhibits no inhibitory effect on gamma-secretase activity. 

      In order to address this question, we performed additional experiments where we assessed the processing of murine Aβ42 into Aβ38. Analogous to human Aβ42, the murine Aβ42 peptide was not processed to Aβ38 in the assay conditions. These new data have been integrated in the manuscript and added as a Supplementary figure 1B.

      (2) It is curious to know the levels of C99 and C83 in cells in supplementary figure 3.  

      The conditions used in these assays were analogous to the conditions used in the figure 3 (i.e. treatment with Aβ peptides at 1 µM concentrations). Such conditions were associated with profound and consistent APP-CTF accumulation in this model system.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, the authors show that Aβs with low affinity for γ-secretase, but when present at relatively high concentrations, can compete with the longer, higher affinity APPC99 substrate for binding and processing. They also performed kinetic analyses and demonstrate that human Aβ1-42 inhibits γ-secretase-mediated processing of APP C99 and other substrates. Interestingly, neither murine Aβ1-42 nor human p3 (17-42 amino acids in Aβ) peptides exerted inhibition under similar conditions. The authors also show that human Aβ1-42-mediated inhibition of γ-secretase activity results in the accumulation of unprocessed, which leads to p75-dependent activation of caspase 3 in basal forebrain cholinergic neurons (BFCNs) and PC12 cells. 

      These analyses demonstrate that, as seen for γ-secretase inhibitors, Aβ1-42 potentiates this marker of apoptosis. However, these are no any in vivo data to support the physiological significance of the current finding. The author should show in APP KO mice whether gamma-secretase enzymatic activity is elevated or not, and putting back Aβ42 peptide will abolish these in vivo effects. 

      The findings presented in this manuscript form the basis for further in vitro and in vivo research to investigate the mechanisms of inhibition and its contribution to brain pathophysiology. Here, we used well-controlled model systems to investigate a novel mechanism of Aβ42 toxicity. Multiple mechanisms regulate the local concentration of Aβ42 in vivo, making the dissection of the biochemical mechanisms of the inhibition more complex. Nevertheless, beyond the scope of this report, we consider these very reasonable comments as a motivation for further research activities. 

      The experimental concentrations for Aβ42 peptide in the assay are too high, which are far beyond the physiological concentrations or pathological levels. The artificial observations are not supported by any in vivo experimental evidence.

      It is correct that in the majority of the experiments we used low μM concentrations of Aβ42. However, we would like to note that we have also performed experiments where conditioned medium collected from human APP.Swe expressing neurons was used as a source of Aβ. In these experiments total Aβ concentration was in low nM range (0.5-1 nM) (Figure 7). Treatment with this conditioned medium  led to the increase APP-CTF levels, supporting  that low nM concentrations of Aβ are sufficient for partial inhibition of  γ-secretase. 

      In addition, we highlight that analyses of the brains of the AD affected individuals have shown that APPCTFs accumulate in both sporadic and genetic forms of the disease (Pera et al. 2013, Vaillant-Beuchot et al. 2021); and recently, Ferrer-Raventós et al. 2023 have revealed a correlation between APP-CTFs and Aβ levels at the synapse (Ferrer-Raventós et al. 2023). We therefore assessed the concentration of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals. Our findings and conclusions are included in the revised version as follows: 

      In the results section:

      “We next investigated the levels of Aβ42 in synaptosomes derived from frontal cortices of post-mortem AD and age-matched non-demented (ND) control individuals (Figure 10B). Towards this, we prepared synaptosomes from frozen brain tissues using Percoll gradient procedure (62, 63). Intact synaptosomes were spun to obtain a pellet which was resuspended in minimum amount of PBS, allowing us to estimate the volume containing the resuspended synaptosome sample. This is likely an overestimate of the actual synaptosome volume. Finally, synaptosomes were lysed in RIPA buffer and Aβ peptide concentrations measured using ELISA (MSD). We observed that the concentration of Aβ42 in the synaptosomes from (end-stage) AD tissues was significantly higher (10.7 nM)  than those isolated from non-demented tissues (0.7 nM), p<0.0005***. These data provide evidence for accumulation at nM concentrations of endogenous Aβ42 in synaptosomes in end-stage AD brains. Given that we measured Aβ42 concentration in synaptosomes, we speculate that even higher concentrations of this peptide may be present in the endolysosome vesicle system, and therein inhibit the endogenous processing of APP-CTF at the synapse. Of note treatment of PC12 cells with conditioned medium containing even lower amounts of Aβ (low nanomolar range (0.5-1 nM)) resulted in the accumulation of APP-CTFs.” 

      In the discussion: 

      “The convergence of Aβ42 and tau at the synapse has been proposed to underlie synaptic dysfunction in AD (86-89), and recent assessment of APP-CTF levels in synaptosome-enriched fractions from healthy control, SAD and FAD brains (temporal cortices) has shown that APP fragments concentrate at higher levels in the synapse in AD-affected than in control individuals (90).  Our analysis adds that endogenous Aβ42 concentrates in synaptosomes derived from end-stage AD brains to reach ~10 nM, a concentration that in CM from human neurons inhibits γ-secretase in PC12 cells (Figure 7). Furthermore, the restricted localization of Aβ in endolysosomal vesicles, within synaptosomes, likely increases the local peptide concentration to the levels that inhibit γ-secretase-mediated processing of substrates in this compartment. In addition, we argue that the deposition of Aβ42 in plaques may be preceded by a critical increase in the levels of Aβ present in endosomes and the cyclical inhibition of γ-secretase activity that we propose. Under this view, reductions in γ-secretase activity may be a (transient) downstream consequence of increases in Aβ due to failed clearance, as represented by plaque deposition, contributing to AD pathogenesis. ”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:

      I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.

      Points of improvement

      (1) Noise

      The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper. 

      We thank the reviewer for this comment. We certainly have evidence that even the most state-of-the-art cleaning procedures (such as machine-learning trained ICA decompositions, as we applied here) are unable to remove eye movement artifact entirely from EEG data (Haresign et al., 2021; Phillips et al., 2023). (This applies to our data but also to others’ where confounding effects of eye movements are generally not considered.) Importantly, however, our analyses have been designed very carefully with this explicit challenge in mind. All of our analyses compare changes in the relationship between brain activity and attention as a function of age, and there is no evidence to suggest that different sources of noise (e.g. crying vs. movement) would associate differently with attention durations nor change their interactions with attention over developmental time. And figures 5 and 7, for example, both look at the relationship of EEG data at one moment in time to a child’s attention patterns hundreds or thousands of milliseconds before and after that moment, for which there is no possibility that head or eye movement artifact can have systematically influenced the results.

      Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.

      We appreciate the concern. To calculate IRR we used this function (Cardillo G. (2007) Cohen's kappa: compute the Cohen's kappa ratio on a square matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365). Our “Observed agreement” was 0.7 (std= 0.15). However, we decided to report the Cohen's kappa coefficient, which is generally thought to be a more robust measure as it takes into account the agreement occurring by chance. We conducted the training meticulously (refer to response to Q6, R3), and we have confidence that our coders performed to the best of their abilities.

      (2) Cross-correlation analyses

      I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect? 

      Just to clarify this analysis, we did include a plot showing autocorrelation of theta activity in the original submission (Figs 7A and 7B in the revised paper). These indicate that theta shows little to no autocorrelation. And we can see no way in which this might have influenced our results. From their comments, the reviewer seems rather to be thinking of phasic changes in the autocorrelation, and whether the possibility that greater stability in theta during the time period around looks might have caused the cross-correlation result shown in 7E. Again though we can see no way in which this might be true, as the cross-correlation indicates that greater theta power is associated with a greater likelihood of looking, and this would not have been affected by changes in the autocorrelation.

      A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode? 

      We appreciate this suggestion. This is something we did not consider, and we thank the reviewer for raising it. In response to their comment, we have now rerun the analyses using the new measure (looking-time/(1+N_switches), and we are reassured to find that the results remain highly consistent. Please see Author response image 1 below where you can see the original results in orange and the new measure in blue at 5 and 10 months.

      Author response image 1.

      (3) Clearer definitions of variables, constructs, and visualisations

      The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.

      We appreciate the comment and we have clarified the concepts and their operationalisation throughout the revised manuscript.

      General Remarks

      In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

      We thank the reviewer for the close attention that they have paid to our manuscript, and for their insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:

      The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.

      Weaknesses:

      Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #3 (Public Review):

      Summary:

      Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.

      Strengths:

      The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.

      Weaknesses:

      The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific ways in which clarity can be improved:

      A. Regarding the distinction between constructs, or measures and constructs:

      i. In the results section, I would prefer to mention looking at duration and heart rate as metrics that have been measured, while in the introduction and discussion, a clear 1-to-1 link between construct/cognitive process and behavioural or (neuro)psychophysical measure can be made (e.g., sustained attention is measured via looking durations; autonomic arousal is measured via heart-rate). 

      The way attention and arousal were operationalised are now clarified throughout the text, especially in the results.

      ii. Relatedly, the "attention" variable is not really measuring attention directly. It is rather measuring looking time (proportion of looking time to the toys?), which is the operationalisation, which is hypothesised to be related to attention (the construct/cognitive process). I would make the distinction between the two stronger.

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      B. Each analysis should be set out to address a specific hypothesis. I would rather see hypotheses in the introduction (without direct reference to the details of the models that were used), and how a specific relation between variables should follow from such hypotheses. This would also solve the issue that some analyses did not seem directly necessary to the main goal of the paper. For example:

      i. Are ACF and survival probability analyses aimed at proving different points, or are they different analyses to prove the same point? Consider either making clearer how they differ or moving one to supplementary materials.

      We clarified this in pg. 4 of the revised manuscript.

      ii. The autocorrelation results are not mentioned in the introduction. Are they aiming to show that the variables can be used for cross-correlation? Please clarify their role or remove them.

      We clarified this in pg. 4 of the revised manuscript.

      C. Clarity of cross-correlation figures. To ensure clarity when presenting a cross-correlation plot, it's important to provide information on the lead-lag relationships and which variable is considered X and which is Y. This could be done by labelling the axes more clearly (e.g., the left-hand side of the - axis specifies x leads y, right hand specifies y leads x) or adding a legend (e.g., dashed line indicates x leading y, solid line indicates y leading x). Finally, the limits of the x-axis are consistent across plots, but the limits of the y-axis differ, which makes it harder to visually compare the different plots. More broadly, the plots could have clearer labels, and their resolution could also be improved. 

      This information on what variable precedes/ follows was in the caption of the figures. However, we have edited the figures as per the reviewer’s suggestion and added this information in the figures themselves. We have also uploaded all the figures in higher resolution.

      D. Figure 7 was extremely helpful for understanding the paper, and I would rather have it as Figure 1 in the introduction. 

      We have moved figure 7 to figure 1 as per this request.

      E. Statistics should always be reported, and effects should always be described. For example, results of autocorrelation are not reported, and from the plot, it is also not clear if the effects are significant (the caption states that red dots indicate significance, but there are no red dots. Does this mean there is no autocorrelation?).

      We apologise – this was hard to read in the original. We have clarified that there is no autocorrelation present in Fig 7A and 7D.

      And if so, given that theta is a wave, how is it possible that there is no autocorrelation (connected to point 1)? 

      We thank the reviewer for raising this point. In fact, theta power is looking at oscillatory activity in the EEG within the 3-6Hz window (i.e. 3 to 6 oscillations per second). Whereas we were analysing the autocorrelation in the EEG data by looking at changes in theta power between consecutive 1 second long windows. To say that there is no autocorrelation in the data means that, if there is more 3-6Hz activity within one particular 1-second window, there tends not to be significantly more 3-6Hz activity within the 1-second windows immediately before and after.

      F. Alpha power is introduced later on, and in the discussion, it is mentioned that the effects that were found go against the authors' expectations. However, alpha power and the authors' expectations about it are not mentioned in the introduction. 

      We thank the reviewer for this comment. We have added a paragraph on alpha in the introduction (pg.4).

      Minor points:

      1. At the end of 1st page of introduction, the authors state that: 

      “How children allocate their attention in experimenter-controlled, screen-based lab tasks differs, however, from actual real-world attention in several ways (32-34). For example, the real-world is interactive and manipulable, and so how we interact with the world determines what information we, in turn, receive from it: experiences generate behaviours (35).”

      I think there's more to this though - Lab-based studies can be made interactive too (e.g., Meyer et al., 2023, Stahl & Feigenson, 2015). What remains unexplored is how infants actively and freely initiate and self-structure their attention, rather than how they respond to experimental manipulations.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Stahl, A. E., & Feigenson, L. (2015). Observing the unexpected enhances infants' learning and exploration. Science, 348(6230), 91-94.

      We thank the reviewer for this suggestion and added their point in pg. 4.

      (2) Regarding analysis 4:

      a. In analysis 1 you showed that the duration of attentional episodes changes with age. Is it fair to keep the same start, middle, and termination ranges across age groups? Is 3-4 seconds "middle" for 5-month-olds? 

      We appreciate the comment. There are many ways we could have run these analyses and, in fact, in other papers we have done it differently, for example by splitting each look in 3, irrespective of its duration (Phillips et al., 2023).

      However, one aspect we took into account was the observation that 5-month-old infants exhibited more shorter looks compared to older infants. We recognized that dividing each into 3 parts, regardless of its duration, might have impacted the results. Presumably, the activity during the middle and termination phases of a 1.5-second look differs from that of a look lasting over 7 seconds.

      Two additional factors that provided us with confidence in our approach were: 1) while the definition of "middle" was somewhat arbitrary, it allowed us to maintain consistency in our analyses across different age points. And, 2) we obtained a comparable amount of observations across the two time points (e.g. “middle” at 5 months we had 172 events at 5 months, and 194 events at 10 months).

      b. It is recommended not to interpret lower-level interactions if more complex interactions are not significant. How are the interaction effects in a simpler model in which the 3-way interaction is removed? 

      We appreciate the comment. We tried to follow the same steps as in (Xie et al., 2018). However, we have re-analysed the data removing the 3-way interaction and the significance of the results stayed the same. Please see Author response image 2 below (first: new analyses without the 3-way interactions, second: original analyses that included the 3-way interaction).

      Author response image 2.

      (3) Figure S1: there seems to be an outlier in the bottom-right panel. Do results hold excluding it? 

      We re-run these analyses as per this suggestion and the results stayed the same (refer to SM pg. 2).

      (4) Figure S2 should refer to 10 months instead of 12.

      We thank the reviewer for noticing this typo, we have changed it in the reviewed manuscript (see SM pg. 3). 

      (5) In the 2nd paragraph of the discussion, I found this sentence unclear: "From Analysis 1 we found that infants at both ages showed a preferred modal reorientation rate". 

      We clarified this in the reviewed manuscript in pg10

      (6) Discussion: many (infant) studies have used theta in anticipation of receiving information (Begus et al., 2016) surprising events (Meyer et al., 2023), and especially exploration (Begus et al., 2015). Can you make a broader point on how these findings inform our interpretation of theta in the infant population (go more from description to underlying mechanisms)? 

      We have extended on this point on interpreting frequency bands in pg13 of the reviewed manuscript and thank the reviewer for bringing it up.

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397-12402.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Begus, K., Southgate, V., & Gliga, T. (2015). Neural mechanisms of infant learning: differences in frontal theta activity during object exploration modulate subsequent object recognition. Biology letters, 11(5), 20150041.

      (7) 2nd page of discussion, last paragraph: "preferred modal reorientation timer" is not a neural/cognitive mechanism, just a resulting behaviour. 

      We agree with this comment and thank the reviewer for bringing it out to our attention. We clarified this in in pg12 and pg13 of the reviewed manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments and questions that I think the authors should consider addressing in a revised version. Please see below:

      (1) During preprocessing (steps 5 and 6), it seems like the "noisy channels" were rejected using the pop_rejchan.m function and then interpolated. This procedure is common in infant EEG analysis, but a concern arises: was there no upper limit for channel interpolation? Did the authors still perform bad channel interpolation even when more than 30% or 40% of the channels were identified as "bad" at the beginning with the continuous data? 

      We did state in the original manuscript that “participants with fewer than 30% channels interpolated at 5 months and 25% at 10 months made it to the final step (ICA) and final analyses”. In the revised version we have re-written this section in order to make this more clear (pg. 17).

      (2) I am also perplexed about the sequencing of the ICA pruning step. If the intention of ICA pruning is to eliminate artificial components, would it be more logical to perform this procedure before the conventional artifacts' rejection (i.e., step 7), rather than after? In addition, what was the methodology employed by the authors to identify the artificial ICA components? Was it done through manual visual inspection or utilizing specific toolboxes? 

      We agree that the ICA is often run before, however, the decision to reject continuous data prior to ICA was to remove the very worst sections of data (where almost all channels were affected), which can arise during times when infants fuss or pull the caps. Thus, this step was applied at this point in the pipeline so that these sections of really bad data were not inputted into the ICA. This is fairly widespread practice in cleaning infant data.

      Concerning the reviewer’s second question, of how ICA components were removed – the answer to this is described in considerable detail in the paper that we refer to in that setion of the manuscript. This was done by training a classifier specially designed to clean naturalistic infant EEG data (Haresign et al., 2021) and has since been employed in similar studies (e.g. Georgieva et al., 2020; Phillips et al., 2023).

      (3) Please clarify how the relative power was calculated for the theta (3-6Hz) and alpha (6-9Hz) bands. Were they calculated by dividing the ratio of theta or alpha power to the power between 3 and 9Hz, or the total power between 1 (or 3) and 20 Hz? In other words, what does the term "all frequency bands" refer to in section 4.3.7? 

      We thank the reviewer for this comment, we have now clarified this in pg. 22.

      (4) One of the key discoveries presented in this paper is the observation that attention shifts are accompanied by a subsequent enhancement in theta band power shortly after the shifts occur. Is it possible that this effect or alteration might be linked to infants' saccades, which are used as indicators of attention shifts? Would it be feasible to analyze the disparities in amplitude between the left and right frontal electrodes (e.g., Fp1 and Fp2, which could be viewed as virtual horizontal EOG channels) in relation to theta band power, in order to eliminate the possibility that the augmentation of theta power was attributable to the intensity of the saccades? 

      We appreciate the concern. Average saccade duration in infants is about 40ms (Garbutt et al., 2007). Our finding that the positive cross-correlation between theta and look duration is present not only when we examine zero-lag data but also when we examine how theta forwards-predicts attention 1-2 seconds afterwards seems therefore unlikely to be directly attributable to saccade-related artifact. Concerning the reviewer’s suggestion – this is something that we have tried in the past. Unfortunately, however, our experience is that identifying saccades based on the disparity between Fp1 and Fp2 is much too unreliable to be of any use in analysing data. Even if specially positioned HEOG electrodes are used, we still find the saccade detection to be insufficiently reliable. In ongoing work we are tracking eye movements separately, in order to be able to address this point more satisfactorily.

      (5) The following question is related to my previous comment. Why is the duration of the relationship between theta power and moment-to-moment changes in attention so short? If theta is indeed associated with attention and information processing, shouldn't the relationship between the two variables strengthen as the attention episode progresses? Given that the authors themselves suggest that "One possible interpretation of this is that neural activity associates with the maintenance more than the initiation of attentional behaviors," it raises the question of (is in contradiction to) why the duration of the relationship is not longer but declines drastically (Figure 6). 

      We thank the reviewer for raising this excellent point. Certainly we argue that this, together with the low autocorrelation values for theta documented in Fig 7A and 7D challenge many conventional ways of interpreting theta. We are continuing to investigate this question in ongoing work.

      (6) Have the authors conducted a comparison of alpha relative power and HR deceleration durations between 5 and 10-month-old infants? This analysis could provide insights into whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during free play.

      We thank the reviewer for this suggestion. Indeed, this is an aspect we investigated but ultimately, given that our primary emphasis was on the theta frequency, and considering the length of the manuscript, we decided not to incorporate. However, we attached Author response image 3 below showing there was no significant interaction between HR and alpha band.

      Author response image 3.

      Reviewer #3 (Recommendations For The Authors):

      (1) In reading the manuscript, the language used seems to imply longitudinal data or at the very least the ability to detect change or maturation. Given the cross-sectional nature of the data, the language should be tempered throughout. The data are illustrative but not definitive. 

      We thank the reviewer for this comment. We have now clarified that “Data was analysed in a cross-sectional manner” in pg15.

      (2) The sample size is quite modest, particularly in the specific age groups. This is likely tempered by the sheer number of data points available. This latter argument is implied in the text, but not as explicitly noted. (However, I may have missed this as the text is quite dense). I think more notice is needed on the reliability and stability of the findings given the sample. 

      We have clarified this in pg16.

      (3) On a related note, how was the sample size determined? Was there a power analysis to help guide decision-making for both recruitment and choosing which analyses to proceed with? Again, the analytic approach is quite sophisticated and the questions are of central interest to researchers, but I was left feeling maybe these two aspects of the study were out-sprinting the available data. The general impression is that the sample is small, but it is not until looking at table s7, that it is in full relief. I think this should be more prominent in the main body of the study.

      We have clarified this in pg16.

      (4) The devotes a few sentences to the relation between looking and attention. However, this distinction is central to the design of the study, and any philosophical differences regarding what take-away points can be generated. In my reading, I think this point needs to be more heavily interrogated. 

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      (5) I would temper the real-world attention language. This study is certainly a great step forward, relative to static faces on a computer screen. However, there are still a great number of artificial constraints that have been added. That is not to say that the constraints are bad--they are necessary to carry out the work. However, it should be acknowledged that it constrains the external validity. 

      We have added a paragraph to acknowledged limitations of the setup in pg. 14.

      (6) The kappa on the coding is not strong. The authors chose to proceed nonetheless. Given that, I think more information is needed on how coders were trained, how they were standardized, and what parameters were used to decide they were ready to code independently. Again, with the sample size and the kappa presented, I think more discussion is needed regarding the robustness of the findings. 

      We appreciate the concern. As per our answer to R1, we chose to report the most stringent calculator of inter-rater reliability, but other calculation methods (i.e., percent agreement) return higher scores (see response to R1).

      As per the training, we wrote an extensively detailed coding scheme describing exactly how to code each look that was handed to our coders. Throughout the initial months of training, we meet with the coders on a weekly basis to discuss questions and individual frames that looked ambiguous. After each session, we would revise the coding scheme to incorporate additional details, aiming to make the coding process progressively less subjective. During this period, every coder analysed the same interactions, and inter-rater reliability (IRR) was assessed weekly, comparing their evaluations with mine (Marta). With time, the coders had fewer questions and IRR increased. At that point, we deemed them sufficiently trained, and began assigning them different interactions from each other. Periodically, though, we all assessed the same interaction and meet to review and discuss our coding outputs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments on our manuscript and their appreciation of the results. We provide point-by-point responses bellow. For your convenience we highlight here the main changes to the manuscript.

      ·        More descriptive terminology for the contextual cues (Ctx.A / Ctx.noA is now referred to as LIGHT / DARK).

      ·        Schematic of experiment timeline highlighting the exclusion of non-discriminators following the initial acquisition period. This explains the absence of baseline sex differences post acquisition and clears up some misconceptions about lack of replicability.

      ·        New data (time in port preCS) showing that a prior reward does not cause continued presence in port.

      ·        Several text edits to address all the points raised by the reviewers.

      We hope that the editors and reviewers will be satisfied with this revised version and find the strength of the evidence more convincing.

      Reviewer #1 (Recommendations For The Authors):

      In relation to weaknesses points 1-4 in the public review:

      (1) With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      I suggest that the authors refine the statement to be more accurate.

      We agree with the reviewer —the context is absolutely relevant for rats trained in the Ctx-dep. O1/O2 task. We have edited the text in several places to make this clear. The question is how (by what mechanism) does the context participate in the control of behavior in this group. The reviewer correctly points out that, just like rats trained in the Ctx-dep. O1 task, rats trained in the Ctx-dep. O1/O2 might have formed context-gated cue-outcome associations. We now clearly acknowledge that in the text.

      However, because in this group the two outcomes are always encountered in different contexts, we argue that these rats could also have formed a direct association between the two contexts and the two outcomes. In other words, each context might directly evoke the expectation of a distinct reward outcome (prepare to drink, or prepare to eat). On a given trial, if the cue and context both tend to activate the same outcome representation, the converging cue+context excitation can add up. This would produce a context-sensitive response, but not via hierarchical modulation process (unlike Ctx-dep O1). Arguably, this last associative mechanism is much simpler and might explain why almost all rats in Ctx-dep. O1/O2 group learned the discrimination and at a much faster rate.

      Therefore, while rats trained in Ctx-dep O1/O2 might engage a combination of associative processes to achieve context-sensitive behavior (including hierarchical associations), only rats in the Ctx-dep O1 critically and unambiguously rely on hierarchical associations to achieve context-sensitive behavior.

      (2) I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      I am just curious as to whether the authors have thoughts on this.

      We have not tested this and are not aware of a paper that examined this question specifically.

      However, we would like to point out that in the suggested design (CS1→[CS2→O1]; CS3→[CS4→O1]) the discriminative cues (CS1 and CS3) would almost certainly also acquire substantial reward-predictive value, either because of their direct association with the reward, or via second-order conditioning. This would complicate the interpretation of the results in terms of hierarchical associations. Incorporating non-rewarded presentation of CS1 and CS3 alone (i.e. extinguishing those cues, as is sometimes done in occasion setting experiments) would be one way to reduce the reward expectation evoked by those cues, but this approach has some limitations. Indeed, as mentioned by Rescorla (2006) “During extinction, the net associative strength of a stimulus declines to the level of [a response] threshold, but further decrement stops at that point”. So while extinguished CS1 and CS3 might no longer evoke overt behavioral responses, these cues could retain nonnegligible subthreshold excitatory connection with the US.  Individually, these cues might fail to evoke responding but could nonetheless increase responding during the CS1→CS2 trials (or CS3→CS4 trials), via simple summation. (Rescorla, 2006: “the compound of two [extinguished] stimuli has a strength that exceeds the threshold and so evokes responding”).

      This type of consideration is precisely why we opted for the behavioral task used in the study. In Ctx-dep. O1, the discriminative stimuli exert opposite effects on the two target cues, which rules out summation effects as a mechanism for context-sensitive behavior.

      (3) Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      I would like the argument that female behaviour might be the default, and male behaviour the deviation to be considered in the discussion in addition to those already stated.

      We regret if we gave the impression that male behavior was the default. The paper is intended to report sex differences but we don’t view either sex as the default. To correct this impression, we have added a few sentences in the discussion to highlight male-hormonal factors as well as non-gonadal genetic factors that might have contributed to the observed sex differences.

      (4) In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

      I suggest OFC be discussed with regard to biological mechanisms of differences.

      We added a few sentences in the discussion to i) highlight the parallel between our study and human fMRI studies showing superior OFC activation in females during the regulation of emotional responses, ii) Suggest a potential relationship between the reported sex differences (speed of acquisition, robustness of performance, and OFC activation in context-gated reward prediction), iii) acknowledge our ignorance of the root causes of these sex differences.

      We wish we could offer a better answer. We have attempted to offer possible proximal explanations for the observed sex differences, but ultimately our work did not address the root causes of these behavioral and neural sex differences. Therefore we feel that further attempts to explain these differences would be too speculative.

      (5) I did wonder if the authors were aware that in the Rescorla-Wagner model, contextual stimuli are thought to summate with discrete cues to enter into the association with the outcome (i.e., the error term is between lambda and sigmaV, with sigmaV the 'summation' of all stimuli present on a trial, including contextual stimuli). Typically, this is not considered much, because the cue itself is so salient and more consistently paired with reward (whereas the ever-present context is often paired with no reward), but nevertheless, it is a part of the association. I'm not sure it's wrong to say that the background circumstances under which events occur are thought to play little role (as in the second sentence of the introduction), but I was wondering if the authors were aware of this fact when they wrote that.

      This sentence in the introduction was meant to introduce the distinction between eliciting stimuli and modulating contexts. Admittedly, this paints a naive picture, which we now acknowledge (we hope that the rest of the paper provides more nuance). As pointed out by this reviewer, the context is also a stimulus, and, just like any other stimulus, it is eligible for direct association with an outcome. The possibility for direct context→outcome association is precisely the rational for the Ctx-dep O1/O2 group.

      (6) Context-noA - Seems a little confusing for a name, why not just call it context B? NoA appears to imply that nothing happens in A or no outcome is available, whereas this is not always the case.

      We debated which terminology to use. We felt that “Context A vs. Context B” should perhaps be reserved to situations where the global context changes (e.g. two different conditioning boxes with different odors, floor texture etc., with proper counterbalancing procedures). We felt that “Context A vs noA” might be more appropriate here, as we are manipulating the local context by introducing (or removing) one single stimulus (the houselight). In this revised version we followed this reviewer’s advice and adopted a more descriptive, and hopefully less confusing, terminology: "Light vs Dark”.

      (7) Why is it that in the text the Ctx-dep O1/O2 is explained before simple and no discrimination, but in the Figure Ctx-dep O1/O2 is shown last? These should be consistent.

      Thanks for pointing that out. We have switched the order of task description to be consistent with the figures.

      (8) Page 6 (of pdf) - could the authors elaborate a little on why or how (or both) the delivery of reward can interfere with the expression of context-dependent discrimination? Do they just mean the performance of discrimination (e.g., animals will sit at the food port longer if there is food there because they are sitting there and eating it, which does not necessarily reflect the expectation of food based on cue presentations?), in which case it is not the discrimination itself that is being interfered with, just the measure of it. Perhaps the authors could elaborate by just inserting a sentence.

      We have added a few sentences to discuss this effect.

      The first clarification that we can make is that the reduced discrimination performance following reward is not simply due to animals’ continued presence in the reward port. We have added the time pre-cue to Fig. 3 B-F. This measure is not affected by previous reward history, showing that rats are leaving the port between trials.

      So what is driving this effect? At this stage, we are agnostic about the mechanism(s) for this effect. Kuchibhotla et al. (2019) —who first reported a similar effect— proposed a model in which recent rewards modify the threshold for behavioral responses (i.e. performance). In this model, a cue might evoke a weak reward prediction but evoke a strong behavioral response if presented after a reward. Additionally, we believe that learning factors might also contribute to the effect reported here. Indeed, the behavioral response on a given trial likely reflects the balance of hierarchical (context-dependent) associations vs. direct associations (Bradfield and Balleine, 2013). Naturally, this balance is dynamic and influenced by trial history. For instance, a Light:X+ trial might increase the value of cue X and promote responding during the following Dark:X- trial. The same logic could be applied to the influence of the context (e.g., Light:X+ trial might promote responding to a subsequent Light:Y- trial). We are currently working on a computational model that captures the dynamic interplay between hierarchical associations and direct associations. We hope that this model will provide some insight into the learning/performance mechanism for the effects reported here. However this computational work is still in the early stages and beyond the scope of the present study.

      (9) The lack of effect in the Ctx-dep O1/O2 groups in Figure 4 could be due to a lack of power - the group sizes are a lot smaller for this group than for Ctx-dep O1 where an interaction was detected. I think this should be at least addressed in the discussion (i.e., that this lack of effect is possibly due to less power here, as the effects are in the same direction).

      Good point. We now acknowledge this limitation in the text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please comment on the failure to replicate the sex differences across experiments. Perhaps this is due to some change in the training procedure that is briefly mentioned in the methods (a reduction in the number of rewarded trials) but it is unclear.

      The reviewer correctly observed that Fig. 3-5 do not show sex differences in baseline condition. This is not because of a replication failure, but because non-discriminating subjects were excluded from the experiment at the end of the acquisition period (after 72 training sessions). We now clarify this in the Method and Results section. We also added a schematic of the experiment timeline that highlights the exclusion of non-discriminators at the end of the acquisition period (Fig 1).

      On the topic of replicability, the data for Ctx-dep O1 was collected over 3 cohorts (over the course of 2 years) and the sex difference pattern was consistent.  For instance, the proportion of discriminators vs. non-discriminators for males and females trained in Ctx-dep O1, showed similar patterns across cohorts (see below).

      Author response table 1.

      (2) The design of this experiment makes it possible to analyse whether there is a differential outcome effect (DOE). The DOE would indeed predict better discrimination in group cxt-dep O1/O2 versus cxt-dep O1, which seems to be exactly what the authors observe although between-group statistics are not reported. Inspection of Figure 1 suggests that there may be a DOE in females but not in males. I wonder if the authors might consider reanalysing the data to check this.

      Indeed, there is clearly a differential outcome effect. We now point out this DOE in relation to the latency to achieve discrimination criterion (Fig. 2 C-D). Rats in the Ctx-dep. O1/O2 group acquired discrimination (reached criterion) much faster than rats in in the Ctx-dep. O1 group.

      Following the reviewer’s suggestion, we provide here the results of targeted ANOVAs (focusing exclusively on Ctx-dep. O1 and Ctx-dep. O1/O2) to investigate a potential sex-dependent effect of DOE (i.e. Sex x Task interactions), see figure below. A three-way ANOVA (Sex x Task x Session) conducted on the discrimination index reveal a main effect of Task (F1, 86 \= 173.560, P < 0.001), Session (F2.678, 230.329 \= 140.479, P<0.001) and a marginal effect of Sex (F1,86 = 3.929, P = 0.051), but critically no Task x Sex or Task x Sex x Session interaction (P ≥ 0.504). A two-way ANOVA (Sex x Task) conducted on the sessions to criterion revealed a main effect of both factors (Sex F1, 63 = 9.52, P = 0.003; Task F1, 62 = 184.143, P < 0.001) but critically, no Sex x Task interaction (P = 0.233).  These results indicate that the use of two different outcomes clearly facilitated the acquisition of context-dependent discrimination (DOE effect), but this effect benefited both sexes equally. We thank the reviewer for recommending this analysis.

      Author response image 1.

      Differential outcome effect (DOE) affects males and females equally. A. Discrimination ratio over the acquisition period. B. trials to criterion. Compared to animals trained with a single outcome (Ctx-dep. O1), the introducing dissociable outcomes for the two type of rewarded trials (Ctx-dep. O1/O2) profoundly facilitated the acquisition of discriminated behavior. This effect benefited both sexes equally.

      (3) Some minor points for clarification that the authors may also wish to address:

      - Figure 3: is data presented from sessions 71-80 only or for all sessions? I didn't fully follow the explanation offered in the results section.

      That’s right. The data presented in Fig. 3 considers only sessions 71-80, in discriminator rats —when performance is globally stable. We have edited the text to make this clearer. These 10 sessions represent a total of 800 trials (=10 session * 80 trials). The first trial of a session what not included in the analysis since it was not preceded by any trial. For the remaining 790 trials (10 session x 79 trials), we examined how the outcome of the past trial (reward or nonrewarded) influenced responding on the next trial.  This large sample size (790 trials / rat) was required to ensure that enough data was collected for each possible trial history scenario.

      - The authors argue that females are protected from the disrupting effect of stress. It might be useful if the authors offer further explanation as to what they mean by "protected".

      By “protected”, we simply mean “less sensitive”. We have reworded this sentence in that way. We do not claim to have an understanding of the precise mechanism for this sex dependent effect (although our data point to a possible role of the OFC).

      - The authors state that "delivery of reward, while critical for learning, can also interfere with the expression of context-dependent discrimination". This statement should be explained in further detail. For instance, why should reward delivery specifically impair context-dependent discrimination but not other forms of discrimination?

      We have reworded this sentence to be more inclusive. Indeed, delivery of reward also interferes with other forms of discrimination, particularly when discrimination performance is not yet optimal. We have also added a paragraph to discuss the possible mechanisms by which reward might interfere with discrimination performance in our task.   

      Reviewer #3 (Recommendations For The Authors):

      I do not suggest additional experiments, but I do hope you continue the behavioral work to characterize what is being learned in the task. I think the approach is promising. I would suggest reporting the % time in port and port entries for the entire CS. There is no justification for only analyzing the response in the last 5s.

      We thank the reviewer for the encouragement.

      We opted to focus on the time in port for two main reasons:

      (1) This measure is relatively consistent across the two different reward outcomes (unlike the rate of port entries). Indeed, consistent with prior studies (Delamater et al., 2017), we observed that the type of reward (solid or liquid) influences the topography of the anticipatory magazine-directed behavior. Specifically, cues paired with pellets elicited significantly more port entries than cues paired with chocolate milk. The opposite pattern was observed for time in port --cues paired with chocolate milk elicited more sustained time in port compared to cues paired with pellets (see figure below). While these measures (port entries and time in port) show opposite bias for the two possible outcomes, the size of this bias is much smaller for the time in port (Cohen’s d effect size: port entries: 1.41; time in port: 0.62). As a result, the discrimination ratio calculated from Time in port is consistent across the two outcomes (P = 0.078; effect size: 0.07), which is not the case for the discrimination ratio calculated from port entries (P = 0.007; effect size 0.32 see figure below).

      (2) Unlike the rate of port entries, the time in port shows monotonic increase during training in these tasks. Indeed, we observed here and in past work (Keiflin et al., 2019), that the rate of port entries initially increases with training, but then slightly decreases; particularly for cues paired with liquid reward. In contrast, the time in port continues to increase, or remains high, with extended training. This is easy to understand if we consider the extreme case of a hypothetical rat that might enter the port once upon cue presentation and maintain continued presence in port for the whole cue duration. This rat would have a relatively low rate of port entry (a single port entry per trial) but a high time in port.

      This is not to say that the rate of port entries is not a valid measure overall (we have used, and continue to use, this metric in other preparations). However, for the reasons explained above, we believe that the time in port is a better metric for reward anticipation in this specific study.

      Moreover, we chose to focus our analysis on the last 5s of the cue because that’s when anticipatory food cup behavior is more reliably observed (in our preparation >2/3 of the total time in port in occurs during the last 5s of the cue) and less contaminated by orienting behaviors (Holland, 1977, 1980, 2000). For these reasons, analysis of the last portion of the cue is relatively common in Pavlovian anticipatory approach preparations (El-Amamy and Holland, 2007; Olshavsky et al., 2013; Esber et al., 2015; Holland, 2016a, 2016b; Schiffino and Holland, 2016; Gardner et al., 2017; Sharpe et al., 2021; Maes et al., 2020; Sharpe et al., 2020; Siemian et al., 2021; Kang et al., 2021). Reporting time in port during the same cue epoch facilitates comparisons between these studies.

      We have edited the text in the Method section to provide a brief justification for focusing our analyses on this cue epoch.

      Author response image 2.

      Outcome identity influences the topography of the conditioned response. A-C: Conditioned responding expressed as the number of port entries per trial (A) or time in port per trials (C) for rats trained in the simple discrimination task with a chocolate milk reward (n= 19) or a sucrose pellet (n = 16). Data show the average of the last three 3 sessions. Compared to chocolate milk, pellets tend to produce more port entries. Conversely, chocolate milk tend to produce more time in port. However the magnitude of this bias is smaller for the Time in port. C-D: discrimination ratio calculate from the number of port entries (C) or the time in port (D); the latter is not affected by the outcome identity. *P<0.05; **P<0.01; ***P<0.001 T tests.

      The inconsistent use of terms is distracting throughout the paper. Is it discriminated or context-gated? Please provide a definition of your terms and then use them consistently. Is it a discriminative stimulus, a context, or an occasion setter? These all imply slightly different things and it would help the reader if you just used one term throughout the paper.

      Thanks for pointing that out. We have added a definition for “context-gated” and edited the text to keep the terminology consistent when appropriate. The words “discrimination”/”discriminated” still appear in the manuscript but without implying a mechanism (all tasks are variations of Pavlovian discrimination; the rats discriminating between rewarded and non-rewarded trials).

      As mentioned by this reviewer, the terms “context” and “occasion setter” are not synonymous. Therefore these terms still appear in the manuscript to refer to different concepts (e.g. in our task the visual stimulus is a context for all rats; this context acts as an occasion setter only for some rats).

      Minor:

      Intro, 2nd PP: "autism". This is abbreviated in the abstract but spelled out here. I suggest not abbreviating in the abstract and introducing abbreviations here, as you do with PTSD.

      Fixed as suggested

      Have deficits in contextual modulation been distinguished from potential deficits in binary associative learning in autism, PTSD, and substance use disorders? This is implied, but there are no citations provided.

      We provide a list of references showing deficits in contextual modulation in these disorders.

      This does not mean that these disorders are reducible to deficits in contextual modulation and it does not exclude other forms of deficits in those disorders --including alterations in certain aspects of binary associative learning.

      "In positive occasion-setting, animals learn that a target cue (X) results in a reward outcome (+) only when that cue is accompanied by a contextual feature (A); the same cue presented in absence of this contextual feature remains without consequence (A:X+ / X-)." - there are words missing in this sentence.

      We apologize but we fail identify the missing word(s). Perhaps the reviewer could be more specific and we will be happy to edit the sentence as needed.

      What is a contextual feature, is this redundant or can you provide a specific definition?

      We use the terminology “feature” and “target” as these are the standard terms in the description of occasion setting preparations (one stimulus, “the feature”, sets the occasion for responding –or not responding- to the “target” cue). By contextual feature, we meant that in this specific example the context was the feature. We have clarified this in the text. We believe that these terms are not redundant. Indeed, the context is not always a feature, and a feature is not necessarily a context (phasic cues can serve as “features”).

      Can you provide some background on studies of sex differences in simple associative learning? You imply these have been much more thoroughly studied than conditional discriminations.

      We added a few references as suggested.

      What is the rationale for studying stress?

      Stressful life events exacerbate several mental illnesses, potentially by impacting cognitive functions.

      Although the (sex-dependent) effects of stress on some cognitive function are well established (e.g. working memory, selective attention, spatial navigation), the effect of stress on contextual modulation (a core dysfunction in certain mental illnesses) --and the possible sex-differences in this effect-- had not been formally tested. We added a few sentences in the results section (at the beginning of the stress section) to remind the reminder of why we tested the effect of stress in this task.

      Method/Results:

      Cues are not counterbalanced; the feature is visual and targets are auditory - this should be noted as a limitation in the discussion section.

      We now acknowledge this limitation in the discussion. Moreover we believe that the new terminology for the context —Light vs Dark— (instead of A vs. noA in the original version) makes it abundantly clear that the “context” is this study was always visual.

      Summation is invoked to describe the discrimination with different outcomes, how is summation happening? This is not described. Perhaps incorporate the literature on conditional discriminations with differential outcomes (the "differential outcomes effect").

      We have edited the Result + Discussion section to clarify how summation might contribute to discrimination with different outcomes. We have also added references for the DOE in this task.

      The stress effect is confounded with test order; comparing stress vs. baseline.

      Sorry we don’t understand this point. The “baseline” refers to the animal’s performance on the last training session before the acute stress manipulation (we have edited the text to make this clear). Animals are first trained in the task and then we examine how stress alters their performance in this learned task. We don’t see how this could induce a test order confound.

      Throughout the results section, it would be helpful to have the number of animals reported for each analysis.

      The number of animals for each part of the experiment is now reported in the text, as well as in the figures.

      Discussion:

      "For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that hierarchically modulates the associative strength between a target cue and its outcome." This is inaccurate. Occasion setters do not change or modulate the associative strength of a target cue. They modulate whether excitation or inhibition is expressed.

      We reworded the sentence as suggested: “For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that modulates the response to a target cue”.

      "Together, these results indicate that the sex differences observed here are not attributable to simple associative, motivational, working-memory, or attentional processes, but are specific to the neurocomputational operations required for the hierarchical, contextual control of behavior." It should be noted here that the difference is one of degree, a quantitative difference, but not a difference in the qualitative features of the process.

      "Regardless of the precise mechanism, our results indicate that, compared to male rats, females ultimately achieved more stable contextual control over cued reward-seeking; their behavior remained context-regulated under stress or after recent rewards." Again this is a matter of degree.

      We absolutely agree. All the sex-difference reported here are a matter of degree. In the framework of McCarthy et al. (2012) the reported effects are type 2 or type 3 sex differences, not type 1 sexual dimorphism. We made a few edits in the Discussion to clarify this point.

      Procedure:

      Please clarify the percentage of trials that were reinforced in the No Discrimination group.

      From session 1-32 (acquisition period), 50% of the trials were reinforced. Following this acquisition period, only 25% of the trials were reinforced to match all the other groups. We have edited the method section to clarify this point.

      Please provide the dimensions of the restraint tubes and the model number if available.

      This information is now included.

      References

      Bradfield LA, Balleine BW (2013) Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. J Exp Psychol Anim Behav Process 39:2–13.

      Delamater AR, Garr E, Lawrence S, Whitlow JW (2017) Elemental, configural, and occasion setting mechanisms in biconditional and patterning discriminations. Behav Processes 137:40–52.

      El-Amamy H, Holland PC (2007) Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557–1567.

      Esber GR, Torres-Tristani K, Holland PC (2015) Amygdalo-striatal interaction in the enhancement of stimulus salience in associative learning. Behav Neurosci 129:87–95.

      Gardner MPH, Conroy JS, Shaham MH, Styer CV, Schoenbaum G (2017) Lateral Orbitofrontal Inactivation Dissociates Devaluation-Sensitive Behavior and Economic Choice. Neuron 96:1192–1203.e4.

      Holland PC (1977) Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J Exp Psychol Anim Behav Process 3:77–104.

      Holland PC (1980) CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. J Exp Psychol Anim Behav Process 6:155–174.

      Holland PC (2000) Trial and intertrial durations in appetitive conditioning in rats. Anim Learn Behav 28:121–135.

      Holland PC (2016a) Enhancing second-order conditioning with lesions of the basolateral amygdala. Behav Neurosci 130:176–181.

      Holland PC (2016b) Effects of amygdala lesions on overexpectation phenomena in food cup approach and autoshaping procedures. Behav Neurosci 130:357–375.

      Kang M, Reverte I, Volz S, Kaufman K, Fevola S, Matarazzo A, Alhazmi FH, Marquez I, Iordanova MD, Esber GR (2021) Agency rescues competition for credit assignment among predictive cues from adverse learning conditions. Sci Rep 11:16187.

      Keiflin R, Pribut HJ, Shah NB, Janak PH (2019) Ventral tegmental dopamine neurons participate in reward identity predictions. Curr Biol 29:93–103.e3.

      Kuchibhotla KV, Hindmarsh Sten T, Papadoyannis ES, Elnozahy S, Fogelson KA, Kumar R, Boubenec Y, Holland PC, Ostojic S, Froemke RC (2019) Dissociating task acquisition from expression during learning reveals latent knowledge. Nat Commun 10:2151.

      Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD (2020) Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23:176–178.

      McCarthy MM, Arnold AP, Ball GF, Blaustein JD, De Vries GJ (2012) Sex differences in the brain: the not so inconvenient truth. J Neurosci 32:2241–2247.

      Olshavsky ME, Song BJ, Powell DJ, Jones CE, Monfils M-H, Lee HJ (2013) Updating appetitive memory during reconsolidation window: critical role of cue-directed behavior and amygdala central nucleus. Front Behav Neurosci 7:186.

      Rescorla RA (2006) Deepened extinction from compound stimulus presentation. J Exp Psychol Anim Behav Process 32:135–144.

      Schiffino FL, Holland PC (2016) Secondary visual cortex is critical to the expression of surprise-induced enhancements in cue associability in rats. Eur J Neurosci 44:1870–1877.

      Sharpe MJ, Batchelor HM, Mueller LE, Gardner MPH, Schoenbaum G (2021) Past experience shapes the neural circuits recruited for future learning. Nat Neurosci 24:391–400.

      Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G (2020) Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 11:106.

      Siemian JN, Arenivar MA, Sarsfield S, Borja CB, Russell CN, Aponte Y (2021) Lateral hypothalamic LEPR neurons drive appetitive but not consummatory behaviors. Cell Rep 36:109615.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study tests the hypothesis that a high autism quotient in neurotypical adults is strongly associated with suboptimal motor planning and visual updating after eye movements, which in turn, is related to a disrupted efference copy mechanism. The implication is that such abnormal behavior would be exaggerated in those with ASD and may contribute to sensory overload - a key symptom in this condition. The evidence presented is convincing, with significant effects in both visual and motor domains, adequate sample sizes, and consideration of alternatives. However, the study would be strengthened with minor but necessary corrections to methods and statistics, as well as a moderation of claims regarding direct application to ASD in the absence of testing such patients.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for several reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.

      The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.

      On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.

      For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key paper on AQ) also shows that those with high AQ skew more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology. Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      We thank the reviewer for the kind words. We now specify more carefully that our sample of participants consists of neurotypical adults scored for autistic traits and none of them was diagnosed with autism before participating in our experiment. Regarding the Autistic Quotient Questionnaire (AQ) on page 5 of the Introduction we now write:

      “The autistic traits of the whole population form a continuum, with ASD diagnosis usually situated on the high end 31-33. Moreover, autistic traits share a genetic and biological etiology with ASD 34. Thus, quantifying autistic-trait-related differences in healthy people can provide unique perspectives as well as a useful surrogate for understanding the symptoms of ASD 31,35.”

      In the Discussion (page 9) we now write:

      ”It is essential to note that our participant pool lacked pre-existing diagnoses before engaging in the experiments and we must address limitations associated with the AQ questionnaire. The AQ questionnaire demonstrates adequate test-retest reliability 36, normal distribution of sum scores in the general population 50, and cross-cultural equivalence has been established in Dutch and Japanese samples 51-53. The AQ effectively categorizes individuals into low, average, and high degrees of autistic traits, demonstrating sensitivity for both group and individual assessments 54.

      However, evolving research underscores many aspects that are not fully captured by the self-administered questionnaire: for example, gender differences in ASD trait manifestation 55. Autistic females may exhibit more socially typical interests, often overlooked by professionals 56. Camouflaging behaviors, employed by autistic women to blend in, pose challenges for accurate diagnosis 57. Late diagnoses are attributed to a lack of awareness, gendered traits, and outdated assessment tools 58. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities 59, or motor skills in everyday situation (MOSES-test 60) becomes crucial for a comprehensive understanding of autistic traits.”

      Suggestions for improvement:

      - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.

      We thank the reviewer for the suggestion. However, the sample size is relatively small for building a robust and generalizable model to predict AQ scores. Statistical models built on small datasets can be prone to overfitting, meaning that they might not accurately predict the AQ for new individuals.

      - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed. 

      The reviewer raises an important point regarding the potential for memory demands to influence our results. We have now also investigated the accuracy of the second saccade separately for the x and y dimension. As also shown in figure 3 panel A, a motor bias was observed only in one dimension (x), weaking the argument of memory which would imply a bias in both directions (participants remembering the position of the target relative to both screen borders for example). We performed a t-test between our subsample of participants and indeed we found a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88).

      We now add these analyses in Discussion on page 8.

      - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment? 

      We thank the reviewer for pointing this out. On page 7 (Results) we now write:

      ” Understanding how these biases might change over time could provide further insights into this mechanism. Specifically, we investigated whether participants exhibited any learning effects throughout the experiments. For data of Experiment 1 – motor updating – we divided our data into 10 separate bins of 30 trials each. We conducted a repeated measure ANOVA with the within-subject factor “number of sessions” (two main sessions of 5 bins each, ~150 trials) and the between-subject factor “group” (lower vs upper quartile of the AQ distribution). We found no main effect of “number of sessions” (F(1,7) = 0.25, p = 0.66), a main effect of “group” (F(1,7) = 2.52, p = 0.015), and no interaction between the two subsample of participants and the sessions tested (F(1,7) = 0.51, p = 0.49). Data of Experiment 2 – visual updating– were separated into 3 sessions. For each session we extracted the PSE and we conducted a repeated measure ANOVA with within subject factor “sessions” and between subject factor “groups” (lower vs upper quartile of the AQ distribution). Also here we found no main effect of sessions (F(1,13) = 0.86, p = 0.39), a main effect of group (F(1,14) = 11.85, p = 0.004), and no interaction between the two subsample of participants and the sessions tested (F(1,13) = 0.20, p = 0.73). In conclusion, the current study found no evidence of learning effects across the experimental sessions. However, a significant main effect of group was observed in both Experiment 1 (motor updating) and Experiment 2 (visual updating). Participants in the group with higher autistic traits performed systematically differently on the task, regardless of the number of sessions completed compared to those in the group with lower autistic traits.”

      Reviewer #2 (Public Review):

      Summary:

      The idea that various clinical conditions may be associated, at least partially, with a disrupted corollary discharge mechanism has been present for a long time.

      In this paper, the authors draw a link between sensory overload, a characteristic of autism spectrum disorder, and a disturbance in the corollary discharge mechanism. The authors substantiate their hypothesis with strong evidence from both the motor and perceptual domains. As a result, they broaden the clinical relevance of the corollary discharge mechanism to encompass autism spectrum disorder.

      The authors write:

      "Imagine a scenario in which you're watching a video of a fast-moving car on a bumpy road. As the car hits a pothole, your eyes naturally make quick, involuntary saccades to keep the car in your visual field. Without a functional efference copy system, your brain would have difficulty accurately determining the current position of your eye in space, which in turn affects its ability to anticipate where the car should appear after each eye movement."

      I appreciate the use of examples to clarify the concept of efference copy. However, I believe this example is more related to a gain-field mechanism, informing the system about the position of the eye with respect to the head, rather than an example of efference copy per se.

      Without an efference copy mechanism, the brain would have trouble accurately determining where the eyes will be in space after an eye movement, and it will have trouble predicting the sensory consequences of the eye movement. However it can be argued that the gain-field mechanism would be sufficient to inform the brain about the current position of the eyes with respect to the head. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      The authors write:

      "In the double-step paradigm, two consecutive saccades are made to briefly displayed targets 21, 22. The first saccade occurs without visual references, relying on internal updating to determine the eye's position."

      Maybe I have missed something, but in the double-step paradigm the first saccade can occur without the help of visual references if no visual feedback is present, that is, when saccades are performed in total darkness. Was this the case for this experiment? I could not find details about room conditions in the methods. Please provide further details.

      In case saccades were not performed in total darkness, then the first saccade can be based on the remembered location of the first target presented, which can be derived from the retinotopic trace of the first stimuli, as well as the contribution from the surroundings, that is: the remembered relative location of the first target with respect to the screen border along the horizontal meridian (i.e. allocentric cues).

      A similar logic could be applied to the second saccade. If the second saccade were based only on the retinotopic trace, without updating, then it would go up and 45 deg to the right, based on the example shown in Figure 1. With appropriate updating, the second saccade would go straight up. However, if saccades were not performed in total darkness, then the location of the second target could also be derived from its relationship with the surroundings (for example, the remembered distance from screen borders, i.e. allocentric cues).

      If saccades were not performed in total darkness, the results shown in Figures 2 and 3 could then be related to i) differences in motor updating between AQ score groups; ii) differences in the use of allocentric cues between AQ score groups; iii) a combination of i) and ii). I believe this is a point worth mentioning in the discussion." 

      Thank you for raising the important issue of visual references in the double-step saccade task. Participants performed saccades in a dimly lit room where visual references, i.e. the screen borders, were barely visible. At the time we collected the data a laboratory that allowed performing experiments in complete darkness was not at our disposal. We acknowledge the possibility that participants could have memorized the target locations relative to the screen borders. The bias of high AQ participants could then be attributed to differences in either encoding, memorization or decoding of the target location relative to the screen borders. However, the potentially abnormal use of visual references must reflect an altered remapping process since we did not find differences in saccade landing in the vertical dimension. A t-test between our group of participants revealed a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88). We thus agree that in addition to an altered efference copy signal in high AQ participants, altered use of visual references might also affect their saccadic remapping.

      In Discussion we now write: “Our findings suggest that a general memory deficit is unlikely to fully explain the observed bias in high-AQ participants' second saccades. As highlighted in Figure 3A, the bias was specific to the horizontal dimension, weakening the argument for a global memory issue affecting both vertical and horizontal encoding of target location. However, it's important to acknowledge that even under non-darkness conditions, participants might rely on a combination of internal updating based on the initial target location and visual cues from the environment, such as screen borders. This potential use of visual references could contribute to the observed bias in the high-AQ group. If high-AQ participants differed in their reliance on visual cues compared to the low-AQ group, it could explain the specific pattern of altered remapping observed in the horizontal dimension. This possibility aligns with our argument for an abnormal remapping process underlying the results. While altered efference copy signals remain a strong candidate, the potential influence of visual cues on remapping in this population warrants further investigation. Future studies could incorporate a darkness condition to isolate the effects of internal updating on the first saccade, and systematically manipulate the availability of visual cues throughout the task. This would allow for a more nuanced understanding of how internal updating and visual reference use interact in the double-step paradigm, particularly for individuals with varying AQ scores “.

      The authors write:

      According to theories of saccadic suppression, an efference copy is necessary to predict the occurrence of a saccade."

      I would also refer to alternative accounts, where saccadic suppression appears to arise as early as the retina, due to the interaction between the visual shift introduced by the eye movement, and the retinal signal associated with the probe used to measure saccadic suppression. This could potentially account for the scaling of saccadic suppression magnitude with saccade amplitude.

      Idrees, S., Baumann, M.P., Franke, F., Münch, T.A. and Hafed, Z.M., 2020. Perceptual saccadic suppression starts in the retina. Nature communications, 11(1), p.1977. 

      We thank the reviewer. Now on page 4 of Introduction we write:

      “Some theories consider saccadic omission and saccadic suppression as resulting from an active mechanism. In this view an efference copy would signal the occurrence of a saccade, yielding a transient decrease in visual sensitivity20-22. Others however have pointed out the possibility that a purely passive mechanism suffices to induce saccadic omission23. A recent study has found evidence for saccadic suppression already in the retina. Idrees et al.24 demonstrated that retinal ganglion cells in isolated retinae of mice and pigs respond to saccade-like displacements, leading to the suppression of responses to additional flashed visual stimuli through visually triggered retinal-circuit mechanisms. Importantly, their findings suggest that perisaccadic modulations of contrast sensitivity may have a purely visual origin, challenging the need for an efference copy in the early stages of saccadic suppression. However, the suppression they measured lasted much longer than time-courses observed in behavioral data. An efference copy signal could thus be necessary to release perception from suppression.”

      Reviewer #3 (Public Review): 

      Summary:

      This work examined efference copy related to eye movements in healthy adults who have high autistic traits. Efference copies allow the brain to make predictions about sensory outcomes of self-generated actions, and thus serve important roles in motor planning and maintaining visual stability. Consequently, disrupted efference copies have been posited as a potential mechanism underlying motor and sensory symptoms in psychopathology such as Autism Spectrum Disorder (ASD), but so far very few studies have directly investigated this theory. Therefore, this study makes an important contribution as an attempt to fill in this knowledge gap. The authors conducted two eye-tracking experiments examining the accuracy of motor planning and visual perception following a saccade and found that participants with high autistic traits exhibited worse task performance (i.e., less accurate second saccade and biased perception of object displacement), consistent with their hypothesis of less impact of efference copies on motor and visual updating. Moreover, the motor and visual biases are positively correlated, indicative of a common underlying mechanism. These findings are promising and can have important implications for clinical intervention if they can be replicated in a clinical sample.

      Strengths:

      The authors utilized well-established and rigorously designed experiments and sound analytic methods. This enables easy translations between similar work in non-human primates and humans and readily points to potential candidates for underlying neural circuits that could be further examined in follow-up studies (e.g., superior colliculus, frontal eye fields, mediodorsal thalamus). The finding of no association between initial saccade accuracy and level of autistic trait in both experiments also serves as an important control analysis and increases one's confidence in the conclusion that the observed differences in task performance were indeed due to disrupted efference copies, not confounding factors such as basic visual/motor deficits or issues with working memory. The strong correlation between the observed motor and visual biases further strengthens the claim that the findings from both experiments may be explained by the same underlying mechanism - disrupted efference copies. Lastly, the authors also presented a thoughtful and detailed mechanistic theory of how efference copy impairment may lead to ASD symptomatology, which can serve as a nice framework for more research into the role of efference copies in ASD.

      Weaknesses:

      Although the paper has a lot of strengths, the main weakness of the paper is that a direct link with ASD symptoms (i.e., sensory overload and motor inflexibility as the authors suggested) cannot be established. First of all, the participants are all healthy adults who do not meet the clinical criteria for an ASD diagnosis. Although they could be considered a part of the broader autism phenotype, the results cannot be easily generalized to the clinical population without further research. Secondly, the measure used to quantify the level of autistic traits, Autistic Quotient (AQ), does not actually capture any sensory or motor symptoms of ASD. Therefore, it is unknown whether those who scored high on AQ in this study experienced high, or even any, sensory or motor difficulties. In other words, more evidence is needed to demonstrate a direct link between disrupted efference copies and sensory/motor symptoms in ASD.

      This is a valid point, and we thank the reviewer for raising it up. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities (Hull, L., Mandy, W., Lai, MC., et al., 2019), or motor skills in everyday situation (MOSES-test, Hillus J, Moseley R, Roepke S, Mohr B. 2019 ) becomes crucial for a comprehensive understanding of autistic traits.”

      We now address this point in Discussion page 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      - The pothole example in the introduction was really hard to follow. I wonder if there is a better example. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      - This is really minor; I would say that saccades are not the most frequent movement that humans perform. Some of the balance-related adjustments and even heartbeats are faster. Maybe just add "voluntary". 

      We thank the reviewer for the suggestion, now added.

      - "Severe consequences" on page 4 is a bit strong. If that were true, there would be pretty severe impairments in eye movement behavior in ASD, which I don't think is the case.

      We agree with the reviewer. We now eliminated the term “severe”.

      - The results section would read better if each experiment had a short paragraph reiterating its overall goal and the specific approach each experiment took to achieve that goal. 

      Now on page 5, for the first experiment, we write:

      ”We investigated the influence of autistic traits on visual updating during saccadic eye movements using a classic double-step saccade task. This task relies on participants making two consecutive saccades to briefly presented targets. The accuracy of the second saccade serves as an indirect measure of how effectively the participant's brain integrated the execution of the first saccade into their internal representation of visual space. Participants were divided into quartiles based on the severity of their autistic traits, as assessed by the Autistic quotient questionnaire (cite). We hypothesized that individuals with higher autistic traits would exhibit greater difficulty in visual updating compared to those with lower autistic traits. This would be reflected in reduced accuracy of their second saccades in the double-step task. Figure 2C illustrates examples from participants at the extremes of the autistic trait distribution (Autistic quotient = 3, in orange and Autistic quotient = 31, in magenta). As shown, both participants were instructed to make saccades to the locations indicated by two brief target appearances (T1 and T2), as quickly and accurately as possible, following the order of presentation. However, successful execution of the second saccade requires accurate internal compensation for the first saccade, without any visual references or feedback available during the saccade itself.”

      On page 6, for experiment 2, we write:

      ”With a trans-saccadic localization task, we explored how autistic traits affect the integration of eye movements into visual perception. Participants were presented with stimuli before and after a single saccade, creating an illusion of apparent motion. We measured the perceived direction of this displacement, which is influenced by how well the participant's brain accounts for the saccadic eye movement. We predicted that individuals with higher autistic traits would show a stronger bias in the perceived displacement direction, suggesting a less accurate integration of the eye movement into their visual perception.”

      - On page 6, the text about "vertical displacement" is confusing. The spatial displacements in this experiment were horizontal? 

      Yes, they were. The spatial displacement is horizontal, but the perceived trajectory (due to the saccade) is vertical. We now changed “vertical displacement” to “vertical trajectory”.

      - Page 6, grammatical problems in "while we report a slightly slant of the dots trajectory". 

      Thank you. Now fixed.

      - It would be helpful to discuss the apparent motion part of Experiment 2 in the main text. This important part is not made clear. 

      We now in Introduction, page 4, write:

      “In this paradigm, one stimulus is shown before and another after saccade execution. Together these two stimuli produce the perception of “apparent motion”. If stimuli are placed such that the apparent motion path is orthogonal to the saccade path, then the orientation of the apparent motion path indicates how the saccade vector is integrated into vision. The apparent motion trajectory can only appear vertical if the movement of the eyes is perfectly accounted for, that is the retinotopic displacement is largely compensated, ensuring spatial stability. However, small biases of motion direction – implying under- (or over-) compensation of the eye movement – can indicate relative failures in this stabilization process. In a seminal study, Szinte and Cavanagh 27 found a slight over-compensation of the saccade vector leading to apparent motion slightly tilted against the direction of the saccade. More importantly, when efference copies are not available, i.e. localization occurring at the time of a second saccade in a double step task, a strong saccade under-compensation occurs 28.

      This phenomenon cannot be explained by perisaccadic mislocalization of flashed visual stimuli 29,30, but the two phenomena may be related in that they may both depend upon efference copy information.”

      - Figure 1 could be improved. For example, the text talks about the motor plan, but this is not clearly shown in the figure.

      We now added the motor plan into the model. Thank you.

      - Figure 2A, the scale is off (the pictures make it look like the horizontal movement was longer than the vertical). 

      Now fixed.

      - Figure 4, it would be helpful if the task was also described in the figure. 

      We thank the reviewer for the comment. We now tried to modify the figure by also adding the perceptual judgment task.

      - Figure 5A, the y-axis shows p(correct), but that is not what the y-axis shows (the legend makes the same mistake). 

      We apologize, it’s the proportion of time participants reported the second dot to be more to the right compared to the first one. We now changed the figure and the text accordingly.

      - A recent study on motion and eye movement prediction in ASD is very relevant to the work presented here.: Park et al. (2021). Atypical visual motion-prediction abilities in autism spectrum disorder. Clinical Psychological Science, 9(5), 944-960.

      Indeed. We now refer to the cited study in Discussion, on page 9.

      Reviewer #2 (Recommendations For The Authors):

      Statistics and plotting.

      I believe some of the reported statistics are not clear. For example, the authors write:

      "Saccade landing positions of participants in the lower quartile (mean degree {plus minus} SEM: 10.17{plus minus} 0.50) did not deviate significantly from those in the upper quartile (mean degree {plus minus} SEM: 9.65 {plus minus} 0.77). This result was also confirmed by a paired sample t-test (t(7) = 0.66; p = 0.66, BF10 = 0.40)"

      Maybe I am missing something, but why use a paired-sample t-test when the upper and lower quartiles constitute different groups of participants? Shouldn't a two-sample t-test be used in this case?

      We apologize for the confusion. It is indeed a two-sample t-test.

      Along the same lines, I do not understand the link between the number of degrees of freedom reported in the t-test (7) and the number of participants reported in the study (41).

      This is also evident when looking at the scatterplot in Figure 3C. How many participants formed the averages and standard errors reported in Figures 3B and 3D? Please clarify.

      I have the same comment(s) also for the visual updating task (and related figures), where 13 degrees of freedom are reported in the t-tests. Please clarify. 

      We thank the reviewer for pointing this out. The number of participants reported in the scatter plots were indeed 42.  However, we opted to compare the averages only in the lower and upper quartile of the AQ distribution to avoid dealing with a median split (which would imply a skewed distribution). Of our sample of participants in Exp1, 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      We now fixed the values accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) The language can be a bit misleading (especially the title and abstract) as it wasn't always clear that the participants don't actually have clinical ASD. I'd suggest avoiding using words like "symptom" as that would indicate clinical severity, and using words like "traits/characteristics" instead for more precise language. 

      We apologize for the misleading terminology used. Now fixed.

      (2) In the Intro: "...perfect compensation results in a vertical trajectory, while small biases indicate stabilization issues23-25." This is a bit confusing without knowing the details of the paradigm. Consider clarifying or at least referring to Figure 4. 

      Thank you.

      (3) In the Results: "This result was also confirmed by a paired sample t-test (t(7) = 0.66;..." This is confusing as a two-sample t-test is the appropriate test here. Also, the degree of freedom seems very low - could the authors clarify how many participants are in each subgroup (i.e., low vs. high AQ quartile), for both experiments? 

      Of our sample of participants in Exp1 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      (4) In the Methods: Experiment 2: "The first dot could appear randomly above or below gaze level at a fixed horizontal location, halfway between the two fixations (x = 0, y = -5{degree sign} or +5{degree sign} depending on the trial). The second dot was then shown orthogonal to the first one at a variable horizontal location (x = 5{degree sign} {plus minus} 2.5{degree sign})." This would mean that the position of the 2nd dot relative to the 1st one would be 2.5{degree sign}- 7.5{degree sign}, but the task description in Results and Figure 5A would suggest the horizontal location of the second dot is x = 0{degree sign} {plus minus} 2.5{degree sign}. Which one is correct? 

      The second option is the correct one. We now fixed the typo in the Methods part.

      (5) There is another study that examined oculomotor efference copies in children with ASD using a similar trans-saccadic perception task (Yao et al., 2021, Journal of Vision). In that study, they found a correlation between task performance and an ASD motor symptom (repetitive behavior). This seems quite relevant to the authors' hypothesis and discussion. 

      We thank the reviewer for the suggestion. We now added the mentioned paper in the discussion.

      (6) Please proofread the entire paper carefully as there were multiple grammatical and spelling errors.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and the thoughtful reviews on our manuscript. The reviewers brought good points regarding the sample size, and the low exposure in the South Asian cohort owing to their unique cultural and social practices. We recognize these as limitations of the paper and discussed these in the revised version. In the revised manuscript, we have taken the key suggestions by reviewers to 1) better illustrate the analytical flow and statistical methods, in particular, to show which datasets had been used in discovery, validation, and testing of the score – as a main figure in the manuscript and in the graphical abstract; 2) demonstrate there is no possibility of overfitting in our approach using statistical metrics of performance; 3) emphasize the goal was not for discovery (e.g. our own EWAS was not used for deriving the score), but to compare with existing EWASs and contrast the results from the white European and SA populations; 4) and supplement the analysis with previously derived maternal smoking, smoking and air pollution methylation score and to explore additional health outcomes in relation to lung health in newborns. Finally, we would also like to take this opportunity to re-iterate that it was not our objective to derive the most powerful methylation score of smoking nor to demonstrate the causal role of maternal smoking on birth weight via DNAm. We have restructure the manuscript as well as the discussion to clarify this. Please find below a point-by-point response to the comments below.

      Reviewer #1:

      The manuscript could benefit from a more detailed description of methods, especially those used to derive MRS for maternal smoking, which appears to involve overfitting. In particular, the addition of a flow chart would be very helpful to guide the reader through the data and analyses. The FDR correction in the EWAS corresponds to a fairly liberal p-value threshold. 

      We thank the reviewer for these good suggestions. In the revised manuscript, we have provided a flow chart as the new Figure 1, more detailed description of the method (added a subsection “Statistical analysis” under Materials and Methods) as well as metrics including measures of fit indices such as AUC and adjusted R2 for each validation and testing dataset to illustrate there is no danger of overfitting (in new Supplementary Table 5).

      The choice of use FDR was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. Here we simply followed the convention in previous studies to contrast the top associated signals for their effects between different populations and with reported effect sizes. Throughout the manuscript, we have removed the notion of significant associations and used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs.

      Reviewer #2:

      (1) The number of mothers who self-reported any smoking was very low, much lower than in the general population and practically non-existent in the South Asian population. As a result, all analyses appeared to have been underpowered. It is possibly for this reason that the authors chose to generate their DNA methylation model using previously published summary statistics. The resulting score is not of great value in itself due to the low-powered dataset used to estimate covariance between CpG sites. In fact, a score was generated for a much larger, better-powered dataset several years ago (Reese, EHP, 2017, PMID 27323799). 

      We thank the reviewer for pointing out the low exposure in the South Asian population, which we believe is complementary to the literature on maternal smoking that almost exclusively focused on white Europeans. However, the score was validating in the white European cohort (CHILD; current smoking 3.1%), which was reasonably similar to the trend that maternal cigarettes smoking is on the decline from 2016 to 2021, from 7.2% to 4.6% (Martin, Osterman, & Driscoll, 2023). This is also consistent with the fact that CHILD participants were recruited from major metropolitans of Canada with relatively high SES and education as compared to FAMILY.

      We do agree with the reviewers that a higher prevalence of maternal smoking in the validating sample could potential improve the power of the score. Our original analytical pipeline focused on CHILD as the validation dataset; FAMILY (see the new Figure 1) was used as the testing data. We alternatively provided an analytical scheme using FAMILY as the validation dataset, as it had a higher proportion of current smokers, however, this is limited by the number of CpGs available (128 in FAMILY vs. 2,619 in CHILD out of the 2,620 CpGs from (Joubert et al., 2016)). The results of all possible combinations of validation vs. testing and restriction of targeted array vs. HM450 are summarized in the new new Supplementary Table 5 and Supplementary Figure 5.

      To clarify, our choice to construct DNAm score using published summary statistics was not an ad-hoc decision due to the observed low power from CHILD EWAS. We agree with the reviewer that our study was indeed underpowered and was not originally intended for EWAS discovery. Thus, we specifically proposed to adopt a multivariate strategy from the literature of polygenic risk scores. This approach enabled us to leverage well-powered association signals without individual-level access to data with a sample size of n > 5,000 (Joubert et al., 2016). In comparison, the Reese maternal smoking score (Reese et al., 2017) had a discovery sample size of only n = 1,057. Our score was not out-performed, in fact, the AUC in both FAMILY (external validating dataset; n=411) and CHILD (external testing dataset; n=352) and was larger than that based on the Reese score as tabulated below (part of the new Supplementary Table 5).

      Author response table 1.

      Further, regarding the comment on the covariance matrix. Indeed, lassosum via elastic-net and summary data requires a reference covariance matrix that is consistent between the discovery data and external validation data. In fact, for moderately sized correlation/covariance values (r2 > 0.1), a sample size of >100 is sufficiently powered to detect it being different from 0 and thus used for estimation. Similar to the linkage disequilibrium of genotype data, the CpGs also exhibit a block-wise correlation structure and thus the theoretical framework of lassosum extends naturally to MRS.

      In the revised manuscript, we included the Reese score, as well as a few additional scores to compare their predictiveness of smoking phenotypes in white European cohorts. We note that the applicability was limited in the FAMILY cohort that was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available. As a result, though the Reese score had similar performance than our derived score in CHILD (0.94 vs. 0.95), its performance in FAMILY was compromised (0.72 vs. 0.89).

      (2) The conclusion that "even minimal smoking exposure in South Asian mothers who were not active smokers showed a DNAm signature of small body size and low birthweight in newborns" is not warranted because no analyses were performed to show that the association between DNA methylation and birth size/weight was driven by maternal smoking. 

      We thank the reviewer for this subtle point – it was not our intention to suggest there was a causal relationship between DNA methylation and birth size that was mediated by maternal smoking. We meant to suggest that the maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. It is possible that maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, such as other environmental exposures that also lead to oxidative stress. These together are associated with reduced birth size/weight.

      In the revised manuscript, we have modified the conclusion above to:

      “Notably, these results indicate a consistent association between the DNAm signature of maternal smoking and a small body size and low birthweight in newborns, in both white European mothers who exhibited some amount of smoking and in South Asian mothers who themselves were not active smokers.”

      (3) Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was either non-existent or not included in this study. Including this would have allowed a more novel investigation of the effects of smoke exposure on the pregnancies of non-smoking mothers.

      We agree with this comment – second-hand smoking was captured by self-reported weekly smoking exposure by the mothers. We reported the association with smoking exposure and found that it was not consistently associated with our methylation scores across the cohorts (cohort specific association p-values of 5.4×10-5, 3.4×10-5, and 0.58, for CHILD, FAMILY, and START; original Table 3), possibly due to the low exposure in South Asian population (max weekly exposure was 42 hrs in contrast to 168 hrs in FAMILY and 98 hrs in CHILD). Meanwhile, air pollution data are currently not available. Here we additionally performed the association between maternal smoking and air pollution methylation score, using key CpGs from the largest air pollution EWAS to-date (Gondalia et al., 2021). However, there was no association between the air pollution score and any maternal smoking phenotypes (ps > 0.4).

      (4) One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites. This set of sites included only 125 sites previously linked to prenatal smoking. The resulting model of prenatal smoking was small (only 11 CpG sites). It is possible that a large model may have been more powerful.

      That is correct – also see our response to R2 comment #1. In our previous analysis, we validated two scores (one based on CpGs on the < 3,000 CpGs array and the other one for the full HM450K). The score with more CpGs indeed had slightly better performance. We included this as one of the limitations of the paper. Nevertheless, it does not impact the conclusion that the scores (based on a larger or smaller model) are transferrable to diverse populations and can be used to comparatively study the DNAm influence of maternal smoking in newborns.

      The following was added in the discussion:

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (5) The health outcomes investigated are potentially interesting but there are other possibly more important outcomes of interest such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking.

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for asthma and allergy at available time points in CHILD, FAMILY, and START. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complications. However, none of these were marginally (p < 0.05) associated with the DNAm smoking score. These are now included in the updated Supplementary Table 8.

      Reviewer #1 (Recommendations For The Authors):

      (1) The number of samples in the South Asian birth cohort given in the abstract (n = 887) does not match the sample size of the START cohort from the results section (results, page 7, line 139, n = 880). It is also different from the final analytical dataset size from the methods section (page 17, line 386, n = 890). Please clarify. 

      We thank the reviewer for pointing this out. In the abstract, it was the final sample sized used for EWAS (no missingness in smoking history). The 880 in result was a typo for 890, which contains three individuals with missing smoking data. These have been updated with the correct sample size for START cohort that had full epigenome-wide methylation data (n = 504, and 503 with non-missing smoking history).

      (2) Page 3, line 54: "consistent signal from the GFI1 gene (ps < 5×10-5)". Is ps a typo? If not then it might be clearer to state how many sites this included. 

      No, these summarized the six CpG sites in the GFI1 gene as outlined in Table 2. We have clarified in the abstract to show the number of CpG sites included.

      (3) Please report effect sizes together with information about the statistical significance (p values). 

      We have updated the manuscript with (standardized) effect sizes whenever possible along with p-values.

      (4) Page 4, line 80. This paragraph could be improved by adding a sentence explaining DNA methylation. 

      We thank the reviewer for this suggestion. A sentence was included to introduce DNAm at the beginning of the second paragraph:

      “DNA methylation is one of the most commonly studied epigenetic mechanisms by which cells regulate gene expression, and is increasingly recognized for its potential as a biomarker (13).”

      (5) Page 4, line 84. Sentence difficult to understand, please rephrase: "Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) demonstrated that out of the 290 CpG sites reported, 19 sites were identified in more than one study; all of them associated with maternal smoking". 

      We have revised to clarify the review was on cord blood EWAS with five outcomes: maternal diabetes, pre-pregnancy body mass index, diet during pregnancy, smoking, and gestational age.

      “Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) found that out of the 290 CpG sites reported to be associated with at least one of the following: maternal diabetes, pre-pregnancy body mass index (BMI), diet during pregnancy, smoking, and gestational age, 19 sites were identified in more than one study and all of them associated with maternal smoking.”

      (6) Page 5, line 93. The second part of the sentence is not necessary: "The majority of cohort studies have focused on participants of European ancestry, but few were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans". 

      We have revised accordingly to:

      “Only a handful of cohort studies were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans.”

      (7) Page 5, line 95. "It has been suggested that ancestral background could influence both systematic patterns of methylation (27), such as cell composition and smoking behaviours (28)". The sentence is slightly unclear. Could it be rephrased to say that cell composition differences may be present by ancestry, which can lead to differential DNAm patterns? 

      We have revised accordingly to:

      “It has been suggested that systematic patterns of methylation (Elliott et al., 2022), such as cell composition, could differ between individuals of different ancestral backgrounds, which could in turn confound the association between differential DNAm and smoking behaviours (Choquet et al., 2021).”

      (8) Page 5, line 108. How does reducing the number of predictors lead to more interpretable effect sizes? 

      This was meant as a general comment in the context of variable selection, whereby the fewer predictors there are, the effect size of each predictor becomes more interpretable. However, we recognize this comment might be irrelevant to the specific approaches we adopted. We have revised it to motivate methylation score as a powerful instrument for analysis:

      “Reducing the number of predictors and measurement noise in the data can lead to better statistical power and a more parsimonious instrument for subsequent analyses.”

      (9) Page 5, line 112. Health consequences seem a bit strong, given that the analysis describes correlations/associations. 

      We have revised it to “association with”:

      “In this paper, we investigated the epigenetic signature of maternal smoking on cord blood DNA methylation in newborns, as well as its influence on newborn and later life outcomes in one South Asian which refers to people who originate from the Indian subcontinent, and two predominantly European-origin birth cohorts.”

      Results

      (10) It would be very helpful to have a flow diagram to detail all of your analyses.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1, updated the summary of analysis in . Table 3, and added a new Supplementary Table 5 for the DNAm score derivation, as well as more detailed description of the statistical analysis in the Materials and Methods under the subsection “Statistical analysis”.

      (11) Page 7, line 138. Please add a reference to the CHILD study. 

      We have added a reference of the CHILD study.

      (12) Tables in results and in supplemental data a) contain a mixture of fields describing the newborn and its mother (this is not true for Supplementary Table 2), b) lack column descriptions, c) lack descriptions of abbreviations and formatting used in tables, d) use different font types, e) lack descriptions of statistical tests that were used to obtain p-values, f) use inconsistent rounding. Please correct and add the missing information.

      We have consolidated the notation and nomenclature in all Tables and text. All numerical results are now rounded to 2 decimal places. The tests used were included in the Table headers as well as described in the Materials and Methods:

      “For continuous phenotypes, an analysis of variance (ANOVA) using the F-statistics or a two-sample t-test was used to compare the mean difference across the three cohorts or two groups, respectively. For categorical phenotypes, a chi-square test of independence was used to compare the difference in frequencies of observed categories. Note that three of the categories under smoking history in the START cohort had expected cell counts less than 5, and was thus excluded from the comparison, the reported p-value was for CHILD and FAMILY.”

      (13) Table 1. Sample sizes given in column descriptions do not add up to 1,650 (legend text).

      We thank the reviewer for pointing this out. The updated sample size is 1,267, based on the 352 CHILD samples, 411 FAMILY samples, and 352 START samples. Notice that we did not remove those without full smoking history data as Table 1 was intended for the epigenetic subsamples.

      (14) Page 7, line 156. Supplementary Tables are incorrectly numbered. In the text, Supplementary Table 4 comes after Supplementary Table 2.

      We thank the reviewer for catching this and have corrected the ordering of the Supplementary Tables and Figures. 

      (15) Page 7, line 158. "cell compositions" - do you mean estimated white cell proportions? 

      We have revised it to “estimated cord blood cell proportions” in the text throughout.

      (16) Smoking EWAS - do you see any overlap/directional consistency with the top findings from adult EWASs of smoking such as AHRR? 

      We annotated the top EWAS signals from the literature in the meta-analysis (new Figure 2; Supplementary Figures 1 and 3), but was only able to confirm associations in the GFI1 gene. The AHRR signals were also annotated, but below the FDR correction threshold as seen in new Figure 2 at the start of chromosome 5. We further added a new Supplementary Figure 3 to show the directional consistency with top findings (2,620 CpGs reported and 128 CpGs overlapped with our meta-analysis) from Joubert et al., 2016. The Pearson’s correlation coefficient with meta-analyzed effect for maternal smoking was 0.72 and for smoking exposure was 0.60.

      We added the following to Results:

      “Further, we observed consistency in the direction of association for the 128 CpGs that overlapped between our meta-analysis and the 2,620 CpGs with evidence of association for maternal smoking (19) (Supplementary Figure 3). Specifically, the Pearson’s correlation coefficient for maternal smoking and weekly smoking exposure was 0.72 and 0.60, respectively.”

      (17) Page 8, line 169. "also coincided with the GFI1 gene" this is a bit imprecise. Please report the correlation with the CpG from the maternal smoking analysis. 

      The CpG was inside the GFI1 gene, we have included the Pearson’s correlation with the top hit in the text below:

      “There were no CpGs associated with the ever-smoker status at an FDR of 0.05, though the top signal (cg09935388) was also mapped to the GFI1 gene (Pearson’s r2 correlation with cg12876356 = 0.75 and 0.68 in CHILD and FAMILY, respectively; Supplementary Figure 1).”

      (18) Page 8, line 171. Typo "ccg": "ccg01798813". 

      It has been corrected to “cpg01798813”.

      (19) Page 8, line 176. Please be clear about the phenotype used in these analyses. 

      The EWAS of weekly smoking exposure in START was removed in this version of the manuscript, in reflection of the results and the reviewer’s comments, as a result of this phenotyping being skewed and possibly leading to only spurious results (also see response to comment #20).

      We have clarified the phenotypes for these results under “Epigenetic Association of Maternal Smoking in White Europeans” below:

      “The maternal smoking and smoking exposure EWASs in CHILD did not yield any CpGs after FDR correction (Supplementary Figure 3).”

      (20) What was the genomic inflation for the EWASs? 474 loci in the South Asian EWAS seems like a lot of findings. Perhaps a more robust method (e.g., OSCA MOMENT) might help to control the false positive rate. 

      The genomic inflation factor was moderately across the cohorts for smoking exposure: 1.02 in CHILD, 0.94 in FAMILY, and 1.00 in START. However, there was more inflation in the tail of the distribution in START than the European cohorts. The empirical type I error rates at 0.01, 0.001, 0.00001, were high in START (x1.7, x5.7, and x165 times at each respective threshold), in contrast to CHILD (x1.06, x1.05, and x0.6) or FAMILY (x1.6, x1.9, and 0). The smoking exposure EWAS based on START was thus removed as these are likely false positives and there was very low smoking exposure to start with (11 reported weekly exposure between 2–42 hrs/week out of 462 with non-missing data). We have added the QQ-plots as well as the genomic inflation factor for the reported meta-analysis in the new Supplementary Figure 2. The following was added to the Results:

      “There was no noticeable inflation of empirical type I error in the association p-values from the meta-analysis, with the median of the observed association test statistic roughly equal to the expected median (Supplementary Figure 2).”

      (21) What is the targeted array? I don't think it has been introduced prior to this point. 

      We introduced it in the Materials and Methods under subsection “Methylation data processing and quality controls”. Considering this comment and previous comments on the ordering of Tables and Figures, we have decided to place Materials and Methods after Introduction and before Results.

      (22) The MRS section is described poorly in the results section. It is not clear where the 11 or 114 CpGs come from.

      We now include an analytical summary of all scores (derived or external from literature) in the new Supplementary Table 5. Further, we updated the description of scores in Materials and Methods under the subsection “Using DNA Methylation to Construct Predictive Models for Maternal Smoking” to clarify the source and types of MRSs derived:

      “To evaluate whether the targeted GMEL-EPIC array design has comparable performance as the epigenome-wide array to evaluate the epigenetic signature of maternal smoking, a total of three MRSs were constructed, two using the 128 CpGs available in all cohorts – across the HM450K and targeted GMEL-EPIC arrays – and with either CHILD (n = 347 with non-missing smoking history) or FAMILY (n = 397) as the validation cohort, and another using 2,107 CpGs that were only available in CHILD and START samples with CHILD as the validation cohort. Henceforth, we referred to these derived maternal smoking scores as the FAMILY targeted MRS, CHILD targeted MRS, and the HM450K MRS, respectively.”

      (23) Page 9, line 187. "There was no statistically significant difference between the two scores in all samples (p = 1.00) or among non-smokers (p = 0.24).". How was the significance assessed? Please describe the models (outcome, covariates, model type) used for comparing the two models. It would also be good to report the correlation between the scores.

      We have added a subsection “Statistical analysis” under Materials and Methods that described the tests. The correlation between scores is now summarized as a heatmap across all cohorts in the new Supplementary Figure 6.

      “For each cohort, we contrasted the three versions of the derived scores using an analysis of variance analysis (ANOVA) along with pairwise comparisons using a two-sample t-test to examine how much information might be lost due to the exclusion of more than 10-fold CpGs at the validation stage. We also examined the correlation structure between all derived and external MRSs using a heatmap summarizing their pairwise Pearson’s correlation coefficient.”

      (24) Please include the number of samples in the training/validation and in the test set in the methods and in the results.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1 and more detailed description of the method in the Materials and Methods. Please also see response to comment #22. The training sample size is based on Joubert et al., (2016), which is 5,647. For our main analyses, the validation sample with non-missing phenotypes remained the CHILD cohort (n=347), while the FAMILY (n=397) and START (n=503) samples were the independent testing data. We alternatively provided another scenario, in which the FAMILY sample was the validation cohort, while CHILD and START were the testing cohorts. The exact sample size and performance metrics for each scenario and score are clearly summarized in the new Supplementary Table 5.

      (25) Table 3. Please clarify the type of information contained in the four last columns (p-value?).

      Yes – these are the individual cohort p-values. We have taken the suggestion from comment #12 to fully describe all columns and fields.

      (26) Page 10, line 215: "The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations between populations". Please quote/refer to the results. 

      In the revision, the heterogeneity p-values were quoted and the relevant tables (Supplementary Table 8) were added to this sentence.

      (27) Figure 2 has issues with x labels. Due to the low number of ever smokers in START, the boxplot may not be the best visualisation method. It would also benefit from listing n's per group.

      We appreciate this comment to improve the figure presentation. We increased the font size for the X-labels. The sample size for each group in START was also labeled in the new Figure 3 (previously Figure 2).

      Discussion

      (28) Studying the association between maternal smoking and cord blood DNAm is interesting from a biological perspective as it allows for assessing the immediate and long-term effects of maternal smoking on newborn health. However, in terms of calculating the MRS, what are the benefits of using cord blood over the mother's blood? We know that blood-based DNAm smoking score is a powerful predictor of long-term smoking status. 

      The reviewer raises an interesting point – abundant literature supports that DNAm changes are tissue-specific. While mother’s blood DNAm smoking score reflect the long-term exposure to smoking in mothers, the cord blood DNAm captures the consequence of such long-term exposure for newborn health. One of the key results of our study is showing that established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), carries the same effect of reducing birth weight and size in the South Asian population. This is a critical finding from a DoHaD and public health perspective, as DNAm signatures of maternal smoking, irrespective of the smoking status of the mother, can influence the health trajectory of the newborns.

      We have expanded our discussion based on this suggestion to highlight the unique features of studying maternal smoking via different tissues and their implications. The following was added to the discussion:

      “There are several advantages of using a cord blood based biomarker from the DoHaD perspective. Firstly, cord blood provides a direct reflection of the in utero environment and fetal exposure to maternal smoking. Additionally, since cord blood is collected at birth, it eliminates potential confounding factors such as postnatal exposures that may affect maternal blood samples. Furthermore, studying cord blood DNAm allows for the assessment of epigenetic changes specifically relevant to the newborn, offering valuable information on the potential long-term health implications.”

      (29) Page 13, line 285: "Fourth" without "third".

      It has been revised accordingly.

      Methods 

      (30) The methods section does not contain all the details required to replicate the analysis. Whenever statistical analysis is conducted, this section should clearly describe the type of the analysis (linear regression, t-test, etc.) and name the dependent and independent variables. Sample sizes should also be given. 

      We added further details of test used and sample size for each analysis. We have also included a new “Statistical analysis” subsection under Materials and Methods.

      (31) Please describe MRS testing in the methods.

      We tested MRS with respect to binary and continuous smoking phenotypes using a logistic and linear regression, respectively. The predictive value was assessed using area under the roc curve for the binary outcome and an adjusted R2 for the continuous outcome. These were added to the new “Statistical analysis” subsection under Materials and Methods. See response to comments #22-24, and #30.

      (32) Please describe the methods used to compare the two versions of MRS for maternal

      smoking.

      It was a two-sample t-test, which was described in the Figure legends. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (33) Please describe testing the associations between MRS and Offspring Anthropometrics in more detail.

      We added further details on the regression model and the test for association in the methods. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (34) Meta analysing the 450k and GMEL arrays is going to substantially reduce the number of CpGs under investigation.

      We agree with the reviewer that this is not optimal for signal discovery. However, this is the only way we could synthesize evidence across the cohorts as FAMILY samples were only processed using the customized array. We added the following as a limitation of the study in the discussion.

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (35) Page 16, line 364: GDM abbreviation was used in the results section (line 145), yet it is introduced in line 364. 

      Thank you for catching this, we have removed the duplicate.

      (36) Page 17, line 381: Given the stated importance of ancestry, why not restrict the sample to genetically confirmed groups?

      The reviewer has a valid point that ancestry, either perceived or genetic, can introduce additional heterogeneity due to potential differences in genetics, cultural and social practices, and lifestyles. Genetic data are indeed available for a subset of the individuals. In the original version of the manuscript, we used a stringent ancestry calling method by mapping all individuals with the 1000 Genomes samples from continental populations. The final definition was based on a combination of self-reported and genetically confirmed ancestry. However, if we restricted only to genetically confirmed groups, the sample size would be reduced to 312 (vs. 411), 268 (vs. 352), and 488 (vs. 504) in FAMILY, CHILD, and START, respectively.

      We compared the mean difference in the beta-values of the top associated CpGs and the derived MRS between those genetically confirmed vs. self-reported ancestral groups, and observed no material difference. These results are now included in the Supplementary Materials as part of the sensitivity analysis. Thus, given these considerations, we decided to use this complementary approach to retain the maximum number of samples while ensuring some aspect of ancestral homogeneity.

      “To maximize sample size in FAMILY and CHILD, we retained either self-identified or genetically confirmed Europeans based on available genetic data (Supplementary Table 1).”

      (37) Page 18, line 397: sensitivity analysis not sensitive analysis.

      Thank you for catching this, we have revised accordingly.

      (38) Page 18, line 409: smoking was rank transformed however, it would be good to see regression diagnostics for the lead loci in the EWAS to check that assumptions were met. 

      We thank the reviewer for this suggestion. Smoking exposure is indeed skewed and in fact very much zero-inflated across the cohorts. The raw phenotype violated several model assumptions in terms of variance heteroskedasticity, outlying values (influential points), and linearity. The diagnostics suggested improved deviation from model assumption, yet some aspects of the violation remained at a lesser degree. We included a comparison of results before and after transformation and model diagnostics for the lead CpG using CHILD and FAMILY data in the Supplementary Materials. The following was added to the results:

      “As a sensitivity analysis, we repeated the analysis for the continuous smoking exposure under rank transformation vs. raw phenotype for the associated CpG in GFI1 and examined the regression diagnostics (Supplementary Material), and found that the model under rank-transformation deviated less from assumptions.”

      (39) Page 19, line 418: FDR seems quite a lenient threshold, especially when genome-wide significance thresholds exist. I would be inclined to view the EWAS findings as null.

      The choice of use FDR to was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. The significance threshold for GWAS (Pe’er et al., 2008) probably does not apply directly to EWAS as the number of effective tests will likely differ between genome-wide genetic variants and CpGs. The Bonferroni corrected p-value threshold in this context would be 0.05/200,050=2.5´10-7, which is still less stringent than the GWAS significance threshold. We originally decided to follow the convention of previous studies and use FDR to filter out a subset of plausible associations to contrast the top association signals for their effects between different populations and with reported effect sizes.

      We have revised the manuscript throughout by removing the notion of significant associations, and instead used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs. The following was added to Materials and Methods to clarify the choice of our threshold:

      “For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

      (40) I do not understand Supplementary Figure 6 - how have the data been standardised? Why not plot the CpGs on the beta-value scale?

      The standardized values were plotted as the reported p-values for the mean and variance equality tests (i.e. ANOVA F-test, Levene’s test, Anderson-Darling test) were based on these transformed values to reduce inflation due to non-normality. We have since removed this comparison and kept only the comparison of the overall score as the number of CpGs in the HM450k score (143 CpGs) for comparison is too high to be visually interpretable.

      (41) It is my understanding, that the MRS for maternal smoking was constructed using external weights projected and regularised using elastic net (effectively trained) in CHILD cohort. The results section discusses associations between maternal smoking history and outcomes in CHILD, FAMILY, and START. Training and testing the score in the same sample (cohort) may result in overfitting and therefore should not be implemented.

      The original MRS was constructed using external weights from an independent discovery sample (Joubert et al., 2016; n > 5,000) and the LASSO validation was done in CHILD (n = 352), external testing was in FAMILY and START. This was the lassosum framework whereby we leverage larger sample size from external studies to select more plausible CpGs as candidates to include in the model. Thus, training, validation, and testing were not done in the same samples. We have included a Figure 1 to illustrate the updated analytical flow and a graphical abstract to summarize the methods.

      (42) Is it a concern that the findings don't seem to replicate Joubert's results, which came from a much larger study?

      Replication is usually done in samples much larger than the discovery samples, thus it is not a concern that we were unable to confirm all signals from Joubert et al., (2016). However, 6/7 of the top associations (FDR adjusted p-value < 0.05) in the meta-analysis were declared as significant in Joubert et al. (2016). In addition, the fact that using Joubert’s summary statistics, we were able to derive MRSs that were strongly associated with both smoking history and weekly exposure suggests shared signals. Also see response to  R1 comment #16 for a comparison of effect consistency.

      (43) Please check that all analysis scripts have been uploaded to Github and that the EWAS results are publicly available.

      We thank the reviewer for this suggestion. All updated scripts and EWAS results are available on Github. We are working to have the results also submitted to EWAS catalog.

      Reviewer #2 (Recommendations For The Authors):

      The impact of this study is reduced due to previous findings:

      (1) Previous studies have already shown that DNA methylation may mediate the effect of maternal smoking on birth size/weight (see e.g.https://doi.org/10.1098/rstb.2018.0120https://doi.org/10.1093/ije/dyv048).

      We thank the reviewer for this point and would like to take the opportunity to clarify that it was not our objective to examine whether there was a causal relationship, between DNA methylation and birth size that was mediated by maternal smoking. One of the key messages of our study is to evaluate whether epigenetic associations – at individual CpGs and aggregated as a score – are consistent between white European and South Asian populations. One way to examine this is through using established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), and confirm whether they also carry the same effect on birth outcomes in the South Asian population.

      Indeed, our results support that maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. These collective point to the possibility that the maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, but potentially other environmental exposures that also lead to oxidative stress. These together are associated with health consequences, including reduced birth size/weight. One of the candidates for such exposure is air pollution as some of the maternal smoking CpGs were previously linked to air pollution. However, we were unable to assess this hypothesis directly without the air pollution data, and the air pollution methylation score was not associated with smoking history (Supplementary Figure 5) nor smoking exposure (p > 0.4 in CHILD, FAMILY and START).

      The following was added to Materials and Methods under the subsection Using DNA Methylation to Construct Predictive Models for Maternal Smoking:

      “To benchmark and compare with existing maternal smoking MRSs, we calculated the Reese score using 28 CpGs (48,49),  Richmond score using 568 CpGs (49), Rauschert score using 204 CpGs (50), Joubert score using all 2,620 CpGs with evidence of association for maternal smoking (19), and finally a three-CpG score for air pollution (51). The details of these scores and score weight can be found in Supplementary Table 4.”

      The following was added to Results

      “Both produced methylation scores that were significantly associated with maternal smoking history (ANOVA F-test p-values =1.0×10-6 and 2.4×10-14 in CHILD and  6.9×10-16 and <2.2×10-16 in FAMILY), and the best among alternative scores for CHILD and FAMILY (Supplementary Table 5). With the exception of the air pollution MRS, all remaining scores were marginally associated with smoking history in both CHILD and FAMILY (Supplementary Figure 5).”

      (2) Due to the small study size and low levels of prenatal smoke exposure, the model derived here is of little value and is, in fact, superseded by a previously published model (PMID: 27323799). At the very least, the model should be evaluated here. A novel aspect of this study is the inclusion of a South Asian cohort. Unfortunately, smoke exposure is practically non-existent, so it is unclear how it can be used. The more interesting finding in this study is the possibility that environmental factors such as second-hand smoke or pollution may have similar effects on pregnancies as maternal smoking. Are these available? If so, they could be evaluated for associations with DNA methylation. This would be novel. 

      In the revised manuscript, we included the Reese score (Reese et al., 2017) and a few other maternal smoking scores for comparison. In the CHILD cohort, the performance was comparable to our derived score (AUC of 0.95 vs. 0.94 for Reese score), but its applicability was limited since the FAMILY dataset was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available (AUC of 0.89 vs. 0.72 for Reese). As compared to the remaining scores from literature (see the new Supplementary Table 5 for complete results), Reese’s score has generally favorable performance.

      We did examine second-hand smoking in the original manuscript, showing a significant association with weekly maternal smoking exposure (original Table 3 and Supplementary Table 8). However, air pollution data is not available for assessment.

      (3) The other novel aspect is the evaluation of associations with outcomes later in life. Height and weight are interesting but impact could be gained by including other relevant outcomes such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking. 

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for mother reported asthma and allergy, but based on different definitions as these outcomes cannot be harmonized across the cohorts. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complication.

      The following was added to Materials and Methods:

      “Mode of delivery (emergency c-section vs. other) was collected at the time of delivery.”

      “Additional phenotypes included smoking exposures (hours per week) at home, potential allergy based on mother reporting any of: eczema, hay fever, wheeze, asthma, food allergy (egg, cow milk, soy, other) for her child in FAMILY and START, and asthma based on mother’s opinion in CHILD (“In your opinion, does the child have any of the following? Asthma”).”

      The following was added to Results:

      “The maternal smoking MRS was consistently associated with increasing weekly smoking exposure in children reported by mothers at the 1-year (0.51±0.15, FDR adjusted p= 0.0052) , 3-year (0.53±0.16, FDR adjusted p= 0.0052), and 5-year (0.40±0.15, FDR adjusted p= 0.021) visits with similar effects.”

      “We did not find any association with self-reported allergy or asthma in children at later visits (Supplementary Table 8). Further, there was no evidence of association between the MRS and any maternal outcomes (Supplementary Table 8).”

      REFERENCES:

      Gondalia, R., Baldassari, A., Holliday, K. M., Justice, A. E., Stewart, J. D., Liao, D., . . . Whitsel, E. A. (2021). Epigenetically mediated electrocardiographic manifestations of sub-chronic exposures to ambient particulate matter air pollution in the Women's Health Initiative and Atherosclerosis Risk in Communities Study. Environ Res, 198, 111211. doi:10.1016/j.envres.2021.111211

      Joubert, B. R., Felix, J. F., Yousefi, P., Bakulski, K. M., Just, A. C., Breton, C., . . . London, S. J. (2016). DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta-analysis. Am J Hum Genet, 98(4), 680-696. doi:10.1016/j.ajhg.2016.02.019

      Martin, J. A., Osterman, M. J. K., & Driscoll, A. K. (2023). Declines in Cigarette Smoking During Pregnancy in the United States, 2016-2021. NCHS Data Brief(458), 1-8. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/36723453

      Reese, S. E., Zhao, S., Wu, M. C., Joubert, B. R., Parr, C. L., Haberg, S. E., . . . London, S. J. (2017). DNA Methylation Score as a Biomarker in Newborns for Sustained Maternal Smoking during Pregnancy. Environ Health Perspect, 125(4), 760-766. doi:10.1289/EHP333

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Hoops et al. showed that Netrin-1 and UNC5c can guide dopaminergic innervation from nucleus accumbens to cortex during adolescence in rodent models. 

      We showed this with respect to Netrin-1 only. With respect to UNC5c, we showed that the timing of its expression suggests that it may be involved, but did not conduct the UNC5cmanipulation experiments necessary to prove it. We state this clearly in the manuscript.

      They found that these dopamine axons project to the prefrontal cortex in a Netrin-1 dependent manner and knocking down Netrin-1 disrupted motor and learning behaviors in mice. 

      We would like to clarify that we did not show that learning or motor behaviors are affected. We showed that inhibitory control, measured in the Go/No-Go task, is altered in adulthood.

      Furthermore, the authors used hamsters, a seasonal model that is affected by the length of daylight, to demonstrate that the guidance of dopamine axons is mediated by the environmental factor such as daytime length and in sex dependent manner. 

      We agree with this characterization of our hamster experiments, but want to emphasize that it is the timing of the adolescent dopamine axon input to the prefrontal cortex what is impacted by daytime length in a sex dependent manner.

      Regarding the cell type specificity of Netrin-1 expression, the authors began by stating "this question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present." This statement contradicts the exact issue regarding the specificity issue I raised.

      We are not sure why the identities of the cell types expressing Netrin-1 are at issue. As a secreted protein, Netrin-1 can be attached to the extracellular cell surface or in the extracellular matrix, where it interacts with its receptors, which are embedded in the cell surfaces of growing axons (Finci et al., 2015; Rajasekharan & Kennedy, 2009). Netrin-1 is expressed by a wide variety of cell types, for example it is expressed in medium spiny neurons in the striatum of rodents as well as in cholinergic neurons (Shatzmiller et al., 2008). However, we cannot see why showing exactly what type(s) of cells have Netrin-1 on their surfaces, or have secreted them into the matrix, would be at issue for our study.

      They then went on to show the RNAscope data for Netrin-1 in Figure 2, which showed Netrin-1 mRNA was actually expressed quite ubiquitously in anterior cingulate cortex, dorsopeduncular cortex, infralimbic cortex, prelimbic cortex, etc. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      We agree that Netrin-1 mRNA is present throughout the forebrain. In particular, its presence in the regions mentioned by Reviewer #1 is a key component of our theory for how dopamine axons grow to the prefrontal cortex in adolescence.

      In addition, contrary to the authors' statement that Netrin-1 is a "secreted protein", the confocal images in Figure 1 in the rebuttal letter actually show Netrin-1 present in "granule-like" organelles inside the cytoplasm of neurons. 

      The rebuttal letter’s Figure 1 is not sufficient to determine the subcellular location of the Netrin-1, however we agree that it is likely that Netrin-1 is present in the cytoplasm of neurons. Indeed, its presence in vesicles in the cytoplasm is to be expected as this is a common mechanism for cells to secrete proteins into the extracellular space (Glasgow et al., 2018). We are not sure whether Reviewer #1’s “granule-like” organelles are in fact secretory vesicles or not, and we do not think our immunohistochemical images are an appropriate method by which to determine this kind of question. We find, however, that a detailed characterization of the subcellular distribution of Netrin-1 is beyond the scope of our study. 

      That Netrin-1 is a secreted protein is well-established in the literature (for example, see Glasgow et al., 2018). The confocal images we provide suggest, but do not prove, that it is likely Netrin-1 is present both extracellularly and intracellularly, which is entirely consistent with its synthesis, secretion, and function. It is also consistent with our methodology and findings. 

      Finally, the authors presented Figure 7 to indicate the location where virus expressing Netrin-1 shRNA might be located. Again, the brain region targeted was quite focal and most likely did not cover all the Netrin-1+ brain regions in Figure 2. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      Figure 7 - this is referring to Author response image 4 of our first response to reviewers.

      We agree with Reviewer #1’s characterization of our experiment. We intended to interrupt the Netrin-1 pathway to the prefrontal cortex, like removing a bridge along a road. The Netrin-1 signal remained intact along the dopamine axon’s route before and after the location of the viral injection, however it was lost at the site of the virus injection. This is like a road remaining intact on either side of a destroyed bridge, but becoming impassable at the location where the bridge was destroyed. We are glad that Reviewer 1 agrees our experimental design achieved the desired outcome (a focal reduction in Netrin-1 expression).

      Collectively, these results raised more questions regarding the specificity of Netrin-1 expression in brain regions that are behaviorally relevant to this study.

      We do not agree with this assessment. Our manipulation of Netrin-1 expression was highly localized and specific, as Reviewer #1 seems to acknowledge. We are not clear on what questions this might raise that would call into question our findings as described in our manuscript. We have now added the following paragraph to our manuscript:  

      “It remains unknown exactly what types of cells are expressing Netrin-1 along the dopamine axon route, and how this expression is regulated to produce the Netrin-1 gradients that guide the dopamine axons. It also remains unclear where the misrouted axons end up in adulthood. Future experiments aimed at addressing these questions will provide further valuable insight into the nature of the “Netrin-1 pathway”. Nonetheless, our results allow us to conclude that Netrin-1 expressing cells “pave the way” for dopamine axons growing to the medial prefrontal cortex.”

      With respect to the effectiveness of Netrin-1 knockdown in the animals in this study, the authors cited data in HEK293 cells (Cuesta et al., 2020. Figure 2a), which did not include any statistics, and previously published in vivo data in a separate, independent study (Cuesta et al., 2020. Figure 2c). They do not provide any data regarding the effectiveness of Netrin-1 knockdown in THIS study.

      Indeed, we understand the concerns of Reviewer 1 here. This issue was discussed at the time all the experiments (both in the current manuscript and in Cuesta et al., (2020)) were conducted, and we decided that it was sufficient to show the virus was capable of knocking down Netrin-1 in vitro and in vivo in the forebrain. These characterization experiments were published in the first manuscript to present results using the virus, which was Cuesta et al., 2020. However, all experiments from both manuscripts were conducted contemporaneously.

      We do not see how repeating the same characterization experiments again is useful. 

      Similar concerns regarding UNC5C knockdown (points #6, #7, and #8) were not adequately addressed.

      There is no UNC5c knockdown in this manuscript. Furthermore, points #6, #7 and #8 do not deal with UNC5c knockdown. Point #6 is regarding the Netrin-1 virus efficacy, which we discuss above. Points #7 and #8 are requesting numerous additional experiments that we feel are worthy of their own manuscripts, and we do not feel that they call into question the findings we present here. Rather, answering points #7 and #8 would further refine our understanding of how dopamine axons grow to the prefrontal cortex beyond our current manuscript.

      In brief, while this study provides a potential role of Netrin-1-UNC5C in target innervation of dopaminergic neurons and its behavioral output in risk-taking, the data lack sufficient evidence to firmly establish the cause-effect relationship.

      We do not claim a cause-effect relationship here or anywhere in the manuscript. Concrete establishment of a cause-effect relationship will require several more manuscripts worth of experiments.

      Reviewer #2 (Public Review):

      In this manuscript, Hoops et al., using two different model systems, identified key developmental changes in Netrin-1 and UNC5C signaling that correspond to behavioral changes and are sensitive to environmental factors that affect the timing of development. They found that Netrin-1 expression is highest in regions of the striatum and cortex where TH+ axons are travelling, and that knocking down Netrin-1 reduces TH+ varicosities in mPFC and reduces impulsive behaviors in a Go-No-Go test. 

      We want to point out that we examined the Netrin-1 expression in the septum rather than the striatum but otherwise feel the above description is accurate.

      Further, they show that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters, environmental effects on development are also sexually dimorophic. This study addresses an important question using approaches that link molecular, circuit and behavioral changes. Understanding developmental trajectories of adolescence, and how they can be impacted by environmental factors, is an understudied area of neuroscience that is highly relevant to understanding the onset of mental health disorders. I appreciated the inclusion of replication cohorts within the study.

      We appreciate Reviewer #2’s comments, which we feel accurately describe our experimental approach and findings, including their limitations.

      Reviewer #3 (Public Review):

      This study from the Flores group aims at understanding neuronal circuit changes during adolescence which is an ill-defined, transitional period involving dramatic changes in behavior and anatomy. They focus on DA innervation of the prefrontal cortex, and their interaction with the guidance cue Netrin1. They propose DA axons in the PFC increase in the postnatal period, and their density is reduced in a Netrin 1 knockdown, suggesting that Netrin abets the development of this mesocortical pathway. 

      We feel it necessary to point out that we are not the first to propose that dopamine axons in the prefrontal cortex increase in the postnatal period.  This is well-established and was first documented in rodents in the 1980s (Kalsbeek et al., 1988). Otherwise we agree with Reviewer 3’s characterization.

      In such mice impulsivity gauged by a go-no go task is reduced. They then provide some evidence that Unc5c is developmentally regulated in DA axons. Finally they use an interesting hamster model, to study the effect of light hours on mesocortical innervation, and make some interesting observations about the timing of innervation and Unc5c expression, and the fact that females housed in winter day length conditions display an accelerated innervation of the prefrontal cortex.

      We agree with Reviewer #3’s characterization of our study and findings here.

      Comments on the revision. Several points were addressed; some remain to be addressed.

      (4) It's not clear to me that TH doesnt stain noradrenergic axons in the PFC. See Islam and Blaess, 2021, and references therein.

      Presuming that Reviewer #3 is referring to Islam et al. (2021), the review they cite supports our position that TH-stained axons in the forebrain are by-and-large dopamine axons.

      Nonetheless, Islam et al. do point out that it is important to keep in mind that TH-positive axons have a slight possibility of being noradrenaline axons. We are very conscious of this possibility and are careful to minimize this risk. As we state in the methods, we only examine axons that are morphologically consistent with dopamine axons and are localized to areas within the forebrain where dopamine axons are known to innervate, in addition to being THpositive. The localization and morphology of noradrenaline axons in the forebrain is different from that of dopamine axons. This is stated in our methods on lines 76-94, where we describe in detail the differentiation between dopamine and norepinephrine axons and include a full list of relevant citations.

      (6) The Netrin knockdown data provided is from a previous study/samples.

      Indeed, however the experiments for the two manuscripts were conducted contemporaneously. We believe two sets of validation experiments are not required.

      (8) While the authors make the argument that the behavior is linked to DA, they still haven't formally tested it, in my opinion.

      We agree that we have not formally tested this link. However, we disagree that we claim to have established a formal link in our manuscript.

      (1). Fig 3, UNc 5c  levels are not yet quantified. Furthermore, I agree with the previous reviewer that Unc5C knockdown would corroborate key aspects of the model.

      We present UNC5c quantities for mice in our first response to reviewers (Figure 11 therein) however we did not do so for the hamsters due to the time involved. We are planning further experiments with the hamsters and may include quantification of UNC5c in the nucleus accumbens at such time. However, we do not feel its absence from this manuscript calls into question our findings.

      With regards to the UNC5c knockdown, we agree it would be an informative extension of our findings here, but again we do not feel that it is necessary to corroborate our current findings.

      New - Developmental trajectory of prefrontal TH-positive axons from early adolescence to adulthood is similar in male and female rats, (Willing Juraska et al., 2017). This needs discussion.

      Willing et al. (2017) reported an increase in prefrontal dopamine density during adolescence in male and female rats, with a non-significant trend towards an earlier increase in females.

      This is in line with our current results in mice indicating that the timing of dopamine axon targeting and growth is sex specific. We are currently testing this idea directly using intersectional viral tracing methods. We now added the following sentence to the manuscript: 

      “Differences in the precise timing of dopamine innervation to the PFC in adolescence have been suggested by findings reported in male and female rats (Willing et al., 2017)”.

      References

      Brignani, S., Raj, D. D. A., Schmidt, E. R. E., Düdükcü, Ö., Adolfs, Y., Ruiter, A. A. D., Rybiczka-Tesulov, M., Verhagen, M. G., Meer, C. van der, Broekhoven, M. H., MorenoBravo, J. A., Grossouw, L. M., Dumontier, E., Cloutier, J.-F., Chédotal, A., & Pasterkamp, R. J. (2020). Remotely Produced and Axon-Derived Netrin-1 Instructs GABAergic Neuron Migration and Dopaminergic Substantia Nigra Development. Neuron, 107(4), 684-702.e9. https://doi.org/10.1016/j.neuron.2020.05.037

      Cuesta, S., Nouel, D., Reynolds, LM, Morgunova, A., Torres-Berrio, A., White, A., Hernandez, G., Cooper, HM, Flores, C. (2020). Dopamine axon targeting in the nucleus accumbnes in adolescence requires Netrin-1. Frontiers in Cell and Developmental Biology, 8,  doi:10.3389/fcell.2020.00487

      Finci, L., Zhang, Y., Meijers, R., & Wang, J. H. (2015). Signaling mechanism of the netrin-1 receptor DCC in axon guidance. Progress in Biophysics and Molecular Biology, 118(3), 153-160. https://doi.org/10.1016/j.pbiomolbio.2015.04.001

      Glasgow, S. D., Labrecque, S., Beamish, I. V., Aufmkolk, S., Gibon, J., Han, D., Harris, S. N., Dufresne, P., Wiseman, P. W., McKinney, R. A., Séguéla, P., Koninck, P. D., Ruthazer, E. S., & Kennedy, T. E. (2018). Activity-Dependent Netrin-1 Secretion Drives Synaptic Insertion of GluA1-Containing AMPA Receptors in the Hippocampus. Cell Reports, 25(1),

      168-182.e6. https://doi.org/10.1016/j.celrep.2018.09.028

      Islam, K. U. S., Meli, N., & Blaess, S. (2021). The Development of the Mesoprefrontal Dopaminergic System in Health and Disease. Frontiers in Neural Circuits, 15, 746582. https://doi.org/10.3389/fncir.2021.746582

      Kalsbeek, A., Voorn, P., Buijs, R. M., Pool, C. W., & Uylings, H. B. M. (1988). Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology, 269(1), 58–72. https://doi.org/10.1002/cne.902690105

      Rajasekharan, S., & Kennedy, T. E. (2009). The netrin protein family. Genome Biology, 10(9), 239. https://doi.org/10.1186/gb-2009-10-9-239

      Shatzmiller, R. A., Goldman, J. S., Simard-Émond, L., Rymar, V., Manitt, C., Sadikot, A. F., & Kennedy, T. E. (2008). Graded expression of netrin-1 by specific neuronal subtypes in the adult mammalian striatum. Neuroscience, 157(3), 621–636. https://doi.org/10.1016/j.neuroscience.2008.09.031

      Willing, J., Cortes, L. R., Brodsky, J. M., Kim, T., & Juraska, J. M. (2017). Innervation of the medial prefrontal cortex by tyrosine hydroxylase immunoreactive fibers during adolescence in male and female rats. Developmental Psychobiology, 59(5), 583–589. https://doi.org/10.1002/dev.21525

    1. Author Response:

      We appreciate the constructive reviews. We have performed additional analysis to address reviewer concerns, and we will submit a full revision in the near future. Our new analysis confirms that the visual stimulus can account for about a third of the variance in population neural activity. Pupil dynamics only account for a small fraction of the trial-to-trial variability, less than six percent. Once we regress out the stimulus responses and the pupil dynamics, we can use the network activity to predict the trial-to-trial variability of single neuron responses, and about eight percent of the variance is explained. Thus it appears as though multiplicative gain cannot account for the results. As for the concerns about missing spikes, we would like to direct readers to the supplementary figure that addresses that concern. The analysis shows that the correlation measurements are robust to the imprecisions of spike inference from calcium imaging data. Finally, we would also like to take the opportunity to clarify that we make no claim as to the discreteness of tuning classes. The GMM analysis was performed to obtain a data-driven, granular categorization of neuron tuning, to support detailed statistical analysis. We take no position on the discreteness or lack thereof of these groups. We agree that it is an interesting question, and we are happy to provide additional analysis in the revision to address this question. Our main result on functional connectivity structure holds regardless of the discreteness of neuron tuning selectivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Kim et al. investigated the mechanism by which uremic toxin indoxyl sulfate (IS) induces trained immunity, resulting in augmented pro-inflammatory cytokine production such as TNF and IL6. The authors claim that IS treatment induced epigenetic and metabolic reprogramming, and the aryl hydrocarbon receptor (AhR)-mediated arachidonic acid pathway is required for establishing trained immunity in human monocytes. They also demonstrated that uremic sera from end-stage renal disease (ESRD) patients can generate trained immunity in healthy control-derived monocytes.

      These are interesting results that introduce the important new concept of trained immunity and its importance in showing endogenous inflammatory stimuli-induced innate immune memory. Additional evidence proposing that IS plays a critical role in the initiation of inflammatory immune responses in patients with CKD is also interesting and a potential advance of the field. This study is in large part well done, but some components of the study are still incomplete and additional efforts are required to nail down the main conclusions.

      Thank you very much for your positive feedback.

      Specific comments:

      (1) Of greatest concern, there are concerns about the rigor of these experiments, whether the interpretation and conclusions are fully supported by the data. (1) Although many experiments have been sporadically conducted in many fields such as epigenetic, metabolic regulation, and AhR signaling, the causal relationship between each mechanism is not clear. (2) Throughout the manuscript, no distinction was made between the group treated with IS for 6 days and the group treated with the second LPS (addressed below). (3) Besides experiments using non-specific inhibitors, genetic experiments including siRNA or KO mice should be examined to strengthen and justify central suggestions.

      We are grateful for the invaluable constructive feedback provided. 

      (1) In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to investigate the causal relationship among the AhR pathway, epigenetic modifications, and metabolic rewiring in IS-induced trained immunity. Notably, metabolic rewiring, particularly the upregulation of aerobic glycolysis via the mTORC1 signaling pathway, stands as a pivotal mechanism underlying the induction of trained immunity through the modulation of epigenetic modifications (Riksen NP et al. Figure 1). Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of zileuton, an inhibitor of ALOX5, and 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following zileuton treatment. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 1). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript). These data have been integrated into the revised manuscript as Figure 3D and 5I, and supplementary Figure 5I.

      (2) We apologize for any confusion arising from the unclear description regarding the distinction between the group treated with IS for 6 days and the group subjected to secondary lipopolysaccharide (LPS) stimulation. It is imperative to clarify that induction of trained immunity necessitates 1 day of IS stimulation followed by 5 days of rest, rendering the 6th day sample representative of a trained state. Subsequent to this, a 24-hour LPS stimulation is applied, designating the 7th day sample as a secondary LPS-stimulated cell. This clarification is now explicitly indicated throughout the entirety of Figure 1A and Figure 3A in the revised manuscript.

      (3) In accordance with your feedback, we performed siRNA knockdown of AhR and ALOX5 in primary human monocytes. AhR knockdown markedly attenuated the mRNA expression of TNF-α and IL-6, which are augmented in IS-trained macrophages. Similarly, knockdown of ALOX5 using ALOX5 siRNA abrogated the increase in TNF-α and IL-6 levels upon LPS stimulation in IS-trained macrophages (Author response image 2). Our experiments utilizing AhR siRNA corroborate the involvement of AhR in the expression of AA pathway-related molecules, such as ALOX5, ALOX5AP, and LTB4R1, in IS-induced trained immunity. These data have been incorporated into the revised manuscript as Figure 4E and 5G, and supplementary Figure 5H.  

      Author response image 1.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with zileuton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      Author response image 2.

      Inhibition of IS-trained immunity by knockdown of AhR or ALOX5 in human monocytes. A-C. Human monocytes were transfected with siRNA targeting AhR (siAhR), ALOX5 (siALOX5), or negative control (siNC) for 1 day, followed by stimulation with IS for 24 hours. After a resting period of 5 days, cells were re-stimulated with LPS for 24 hours. mRNA expression levels of AhR and ALOX5 at 1 day after transfection, and TNF-α and IL-6 at 1 day after LPS treatment, were assessed using RT-qPCR. D. Human monocytes were transfected with AhR siRNA or negative control (NC) siRNA for 1 day, followed by stimulation with IS for 24 hours. After resting for 5 days, mRNA expression levels of ALOX5, ALOX5AP, and LTB4R1 were analyzed using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, ** = p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.  

      (2) The authors showed that IS-trained monocytes showed no change in TNF or IL-6, but increased the expression levels of TNF and IL-6 in response to the second LPS (Fig. 1B). This suggests that the different LPS responsiveness in IS-trained monocytes caused altered gene expression of TNF and IL6. However, the authors also showed that IS-trained monocytes without LPS stimulation showed increased levels of H3K4me3 at the TNF and IL-6 loci, as well as highly elevated ECAR and OCR, leading to no changes in TNF and IL-6. Therefore, it is unclear why or how the epigenetic and metabolic states of IS-trained monocytes induce different LPS responses. For example, increased H3K4me3 in HK2 and PFKP is important for metabolic rewiring, but why increased H3K4me3 in TNF and IL6 does not affect gene expression needs to be explained.

      We acknowledge the constructive critiques provided by the reviewer. While epigenetic modifications in the promoters of TNF-α, IL-6, HK2, and PFKP (Figure 3B and Supplementary Figure 3C in the revised manuscript), and metabolic rewiring (Figure 2A-D in the revised manuscript) were observed in IS-trained macrophages at 6 days prior to LPS stimulation, these macrophages do not exhibit an increase in TNF-α and IL-6 mRNA and protein levels before LPS stimulation. This lack of response is attributed to a 5-day resting period, allowing the macrophages to revert to a non-activated state, as depicted in Author response image 3 and 4. This phenomenon aligns with the concept of typical trained immunity.

      Trained immunity is characterized by the long-term functional reprogramming of innate immune cells, which is evoked by various primary insults and which leads to an altered response towards a second challenge after the return to a non-activated state. Metabolic and epigenetic reprogramming events during the primary immune response persist partially even after the initial stimulus is removed. Upon a secondary challenge, trained innate immune cells exhibit a more robust and more prompt response than the initial response (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      Numerous studies have demonstrated the observation of epigenetic modifications in the promoters of TNF-α and IL-6, and metabolic rewiring prior to LPS stimulation as a secondary challenge. However, cytokine production is contingent on LPS stimulation (Arts RJ et al. Glutaminolysis and Fumarate Accumulation Integrate Immunometabolic and Epigenetic Programs in Trained Immunity. Cell Metab. 2016 Dec 13;24(6):807-819; Arts RJW et al. Immunometabolic Pathways in BCG-Induced Trained Immunity. Cell Rep. 2016 Dec 6;17(10):2562-2571; Ochando J et al. Trained immunity - basic concepts and contributions to immunopathology. Nat Rev Nephrol. 2023 Jan;19(1):23-37). The prolonged presence of higher levels of H3K4me3 on immune gene promoters, even after returning to baseline, is associated with open chromatin and results in a more rapid and stronger response, such as cytokine production, upon a secondary insult (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      The results in Figure 1B may be interpreted as indicating different LPS responsiveness in IStrained monocytes caused altered gene expression of TNF and IL-6. However, it is plausible that trained immune cells respond more robustly even to low concentrations of LPS. In fact, the aim of this experiment was to determine the appropriate LPS concentration.

      Author response image 3.

      The changes in mRNA and protein level of TNF-α and IL-6 during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. Cells were stimulated with LPS for 24 hrs. Protein and mRNA levels were assessed by ELISA and RT-qPCR, respectively. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by two-tailed paired t-test.

      Author response image 4.

      The changes in mRNA of HK2 and PFKP induced by IS during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. mRNA levels were assessed by RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05 by two-tailed paired ttest.

      (3) The authors used human monocytes cultured in human serum without growth factors such as MCSF for 5-6 days. When we consider the short lifespan of monocytes (1-3 days), the authors need to explain the validity of the experimental model.

      We appreciate the reviewer’s constructive critiques. As pointed out by the reviewer, human circulating CD14+ monocytes exhibit a relatively short lifespan (1-3 days) when cultured in the absence of growth factors (Patel AA et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J Exp Med. 2017 Jul 3;214(7):1913-1923). In this study, purified CD14+ monocytes were subjected to adherent culture for a duration of 7 days in RPMI1640 media supplemented with 10% human AB serum, a standard in vitro culture protocol widely employed in studies focusing on trained immunity (Domínguez-Andrés J et al. In vitro induction of trained immunity in adherent human monocytes. STAR Protoc. 2021 Feb 24;2(1):100365). In response to the reviewer's suggestions, we assessed cell viability on days 0, 1, 4, and 6, utilizing the WST assay. Despite a marginal reduction in cell viability observed at day 1, attributed to detachment from the culture plate, the cultured monocytes exhibited a notable enhancement in cell viability on days 4 and 6 when compared to days 0 or 1 (Author response image 5).

      It has been demonstrated that the adhesion of human monocytes to a cell culture dish leads to their activation and induces the synthesis of substantial amounts of IL-1β mRNA as observed in monocytes adherent to extracellular matrix components such as fibronectin and collagen.

      Morphologically, human adherent monocytes cultured with 10% human serum appear to undergo partial differentiation into macrophages by day 6, potentially explaining the observed lack of decrease in monocyte viability. Notably, Safi et al. have reported that adherent monocytes cultured with 10% human serum exhibit no significant difference in cell viability over a 7-day period when compared to cultures supplemented with growth factors such as M-CSF and IL-3 (Safi W et al. Differentiation of human CD14+ monocytes: an experimental investigation of the optimal culture medium and evidence of a lack of differentiation along the endothelial line. Exp Mol Med. 2016 Apr 15;48(4):e227).

      Author response image 5.

      Viability of human monocytes during the induction of trained immunity. Purified human monocytes were seeded on plates with RPIM1640 media supplemented with 10% human AB serum. Cell viability was assessed on days 0, 1, 4, and 6 utilizing the WST assay (Left panel). Cell morphology was examined under a light-inverted microscope at the indicated times (Right panel).

      (4) The authors' ELISA results clearly showed increased levels of TNF and IL-6 proteins, but it is well established that LPS-induced gene expression of TNF and IL-6 in monocytes peaked within 1-4 hours and returned to baseline by 24 hours. Therefore, authors need to investigate gene expression at appropriate time points.

      We appreciate the valuable constructive feedback provided by the reviewer. As indicated by the reviewer, the LPS-induced gene expression of TNF-α and IL-6 in IS-trained monocytes exhibited a peak within the initial 1 to 4 hours, followed by a decrease by the 24-hour time point, as illustrated in Author response image 6. Nevertheless, the mRNA expression levels of TNFα and IL-6 were still elevated at the 24-hour mark. Furthermore, the protein levels of both TNFα and IL-6 apparently increased 24 hours after LPS stimulation. Due to technical constraints, sample collection had to be conducted at a single time point, and the 24-hour post-stimulation interval was deemed optimal for this purpose.

      Author response image 6.

      Kinetics of protein and mRNA expression of TNF-α and IL-6 after treatment of LPS as secondary insult in IS-trained monocytes. IS-trained cells were re-stimulated by LPS (10 ng/ml) for the indicated time. The supernatant and lysates were collected for ELISA assay and RT-qPCR analysis, respectively. Bar graphs show the mean ± SEM. * = p <0.05 and **= p < 0.01, by two-tailed paired t-test.

      (5) It is a highly interesting finding that IS induces trained immunity via the AhR pathway. The authors also showed that the pretreatment of FICZ, an AhR agonist, was good enough to induce trained immunity in terms of the expression of TNF and IL-6. However, from this point of view, the authors need to discuss why trained immunity was not affected by kynurenic acid (KA), which is a well-known AhR ligand accumulated in CKD and has been reported to be involved in innate immune memory mechanisms (Fig. S1A).

      We appreciate the constructive criticism provided by the reviewer, and we comprehend the raised points. In our initial experiments, we hypothesized that kynurenic acid (KA), an aryl hydrocarbon receptor (AhR) ligand, might instigate trained immunity in monocytes, despite KA not being our primary target uremic toxin. However, our findings, as depicted in Fig. S1A, demonstrated that KA did not induce trained immunity. Notably, KA-treated monocytes exhibited induction of CYP1B1, an AhR-responsive gene, and elevated levels of TNF-α and IL-6 mRNA at 24 hours post-treatment, comparable to FICZ-treated monocytes. This observation underscores KA's role as an AhR ligand in human monocytes, as emphasized by the reviewer. 

      Of particular interest, proteins associated with the arachidonic acid pathway, such as ALOX5 and ALOX5AP - integral to the mechanisms underlying IS-induced trained immunity - did not exhibit an increase at day 6 following KA treatment, in contrast to the significant elevation observed with IS and FICZ treatments (Author response image 7). The rationale behind this disparity remains unknown, necessitating further investigation to elucidate the underlying factors. These data have been incorporated into the revised manuscript as Supplementary Figure 5C.

      Author response image 7.

      Divergent impact of AhR agonists, especially IS, FICZ, and KA on the AhR-ALOX5 pathway. Purified ytes underwent treatment with IS (1 mM), FICZ (100 nM), or KA (0.5 mM) for 1 day, followed by 5-day resting period to trained immunity. Activation of AhR through ligand binding was assessed by examining the induction of CYP1B1, an AhR ene, and cytokines one day post-treatment. The expression of genes related to the arachidonic acid pathway, such as ALOX5, 5AP, and LTB4R1, was analyzed via RT-qPCR six days after inducing trained immunity. Bar graphs show the mean ± SEM. * .05, **= p < 0.01, and ***= p < 0.001 by two-tailed paired t-test.

      Indeed, it has been demonstrated that FICZ and TCDD, two high-affinity AhR ligands, exert opposite effects on T-cell differentiation, with TCDD inducing regulatory T cells and FICZ inducing Th17 cells. This dichotomy has been attributed to ligand-intrinsic differences in AhR activation (Ho PP et al. The aryl hydrocarbon receptor: a regulator of Th17 and Treg cell development in disease. Cell Res. 2008 Jun;18(6):605-8; Ehrlich AK et al. TCDD, FICZ, and Other High Affinity AhR Ligands Dose-Dependently Determine the Fate of CD4+ T Cell Differentiation. Toxicol Sci. 2018 Feb 1;161(2):310-320). These outcomes imply the involvement of an intricate interplay involving metabolic rewiring, epigenetic reprogramming, and the AhR-ALOX5 pathway in IS-induced trained immunity within monocytes.

      (6) The authors need to clarify the role of IL-10 in IS-trained monocytes. IL-10, an anti-inflammatory cytokine that can be modulated by AhR, whose expression (Fig. 1E, Fig. 4D) may explain the inflammatory cytokine expression of IS-trained monocytes.

      We appreciate the reviewer’s valuable comment, recognizing its significant importance. IL-10, characterized by potent anti-inflammatory attributes, assumes a pivotal role in constraining the host immune response against pathogens. This function serves to mitigate potential harm to the host and uphold normal tissue homeostasis. In the context of atherosclerosis (Mallat Z et al. Protective role of interleukin-10 in atherosclerosis. Circ Res. 1999 Oct 15;85(8):e17-24.) and kidney disease (Wei W et al. The role of IL-10 in kidney disease. Int Immunopharmacol. 2022 Jul;108:108917), IL-10 exerts potent deactivating effects on macrophages and T cells, influencing various cellular processes that could impact the development and stability of atherosclerotic plaques. Additionally, it is noteworthy that IL-10-deficient macrophages exhibit an augmentation in the proinflammatory cytokine TNF-α (Smallie T et al. IL-10 inhibits transcription elongation of the human TNF gene in primary macrophages. J Exp Med. 2010 Sep 27;207(10):2081-8; Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). As emphasized by the reviewer, the reduced gene expression of IL-10 by IS-trained monocytes may contribute to the heightened expression of proinflammatory cytokines. We have thoroughly addressed and discussed this specific point in response to the reviewer's comment (Line 394-399 of page 18 in the revised manuscript).

      (7) The authors need to show H3K4me3 levels in TNF and IL6 genes in all conditions in one figure. (Fig. 2B). Comparing Fig. 2B and Fig. S2B, H3K4me3 does not appear to be increased at all by LPS in the IL6 region. 

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we endeavored to conduct an experiment demonstrating H3K4me3 enrichment on the promoters of TNF-α and IL-6 across all experimental conditions. However, due to limitations in the availability of purified human monocytes, we conducted an additional three independent experiments for ChIP-qPCR across all conditions. Despite encountering a notable variability among individuals, even within the healthy donor cohort, our results demonstrated an increase in H3K4me3 enrichment on the TNF-α and IL-6 promoters in IS-trained groups, irrespective of subsequent LPS treatment (Author response image 8).

      Author response image 8.

      Analysis of H3K4me3 enrichment on the promoters of TNFA and IL6 Loci in IS-trained macrophages. ChIP-qPCR was employed to assess the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci before (day 6) and after LPS stimulation (day 7) in IS-trained macrophages. The normalization control utilized 2% input. Bar graphs show the mean ± SEM. The data presented are derived from three independent experiments utilizing samples from different donors.

      (8) The authors need to address the changes of H3K4me3 in the presence of MTA.

      We appreciate the constructive criticism provided by the reviewer. In response to the reviewer's feedback, we conducted an analysis of the changes in H3K4me3 in the presence of MTA, a general methyltransferase inhibitor, using identical conditions as depicted in Figure 2C of the original manuscript. Our findings revealed that MTA exerted inhibitory effects on the levels of H3K4me3, as isolated through the acid histone extraction method, which were otherwise increased by IS-training, as illustrated in Author response image 9. 

      Author response image 9.

      The reduction of H3K4me3 by MTA treatment in IS-trained macrophages. IS-trained cells were restimulated by LPS (10 ng/ml) as a secondary challenge for 24 hrs, followed by isolation of histone and WB analysis for H3K4me3, Histone 3 (H3), and β-actin. The blot data from two independent experiments with different donors were shown.

      (9) Interpretation of ChIP-seq results is not entirely convincing due to doubts about the quality of sequencing results. First, authors need to provide information on the quality of ChIP-seq data in reliable criteria such as Encode Pipeline. It should also provide representative tracks of H3K4me3 in the TNF and IL-6 genes (Fig. 2F). And in Fig. 2F, the author showed the H3K4me3 track of replicates, but the results between replicates were very different, so there are concerns about reproducibility. Finally, the authors need to show the correlation between ChIP-seq (Fig. 2) and RNA-seq (Fig. 5).

      We appreciate the constructive criticism provided by the reviewer. 

      As indicated by the reviewer, for evaluation of sample read quality, analysis was performed using the histone ChIP-seq standard from the ENCODE project, focusing on metrics such as read depth, PCR bottleneck coefficient (PBC)1, PBC2, and non-redundant fraction (NRF). Five of the total samples were displayed moderate bottleneck levels (0.5 ≤ PBC1 < 0.8, 1 ≤ PBC2 < 3) with acceptable (0.5 ≤ NRF < 0.8) complexity. One sample showed mild bottlenecks (0.8 ≤ PBC1 < 0.9, 3 ≤ PBC2 < 10) with compliance (0.8 ≤ NRF < 0.9) complexity. This quality metrics indicated ChIP-seq data quality meets at least the standards required for downstream analysis according to ENCODE project criteria (Author response image 10A).

      To examine the differences in H3K4me3 enrichment patterns between two groups, we normalized the read counts around the TSS ±2 kb of human genes to CPM. Sequentially, we compared the average values of IS-treated macrophage compare to control and displayed in waterfall plots. In addition, we marked genes of interest in red including the phenotypes of IStrained macrophages (TNF and IL6), the activation of the innate immune responses (XRCC5, IFI16, PQBP1), and the regulation of ornithine decarboxylase (OAZ3, PSMA3, PSMA1) (Author response image 10B and C). Also, H3K4me3 peak tracks of TNF and IL6 loci and H3K4me3 enrichment pattern were added in supplementary Figure 3D and 3F in the revised manuscript.

      Next, to evaluate the consistency among replicates within a group, we analyzed enrichment values, expressed as Counts per Million (CPM) using edgeR R-package, by applying Spearman's correlation coefficients. we analyzed two sets included total 7,136 H3K4me3 peak sets, as described in Figure 3E in the revised manuscript and 2 kbp around transcription start sites (TSS) from hg19 human genomes. The resulting Spearman's correlation coefficients and associated P-values demonstrated a concordance between replicates, confirming reproducibility and consistent performance (Author response image 10D). 

      Finally, the correlation between gene expression and H3K4me3 enrichment around transcription start sites (TSS) has been reported in previous research (Reshetnikov VV et al. Data of correlation analysis between the density of H3K4me3 in promoters of genes and gene expression: Data from RNA-seq and ChIP-seq analyses of the murine prefrontal cortex. Data Brief. 2020 Oct 2;33:106365). To verify this association in our study, we applied Spearman's correlation for comparative analysis and conducted linear regression to determine if a consistent global trend in RNA expression existed. In our analysis, count values from regions extending 2 kbp around the TSSs in H3K4me3 ChIP-seq data were converted to Counts per Million (CPM) using edgeR R-package. These were then contrasted with the Transcripts Per Million (TPM) values of genes. Our results revealed a significant positive correlation, reinforcing the consistent relationship between H3K4me3 enrichment and gene expression (Author response image 10E and Supplementary Fig. 6D in revised manuscripts).

      Author response image 10.

      The information on quality of ChIP-seq data and correlation between ChIP-seq and RNA-seq. A, information on quality of ChIP-seq data. B, H3K4me3 peak of promoter region on TNFA and IL6. C, The differences in H3K4me3 enrichment patterns between control group and IS-training group. D, The consistency among replicates within a group. E, Correlation between ChIP-seq and RNA-seq in IS-induced trained immunity.

      (10) AhR changes in the cell nucleus should be provided (Fig. 4A).

      We appreciate the constructive feedback from the reviewer. In response to the reviewer's suggestions, we investigated the nuclear translocation of AhR on 6 days after the induction of ISmediated trained immunity, as illustrated in Author response image 11. For this purpose, the lysate from IS-trained monocytes was fractionated into the nucleus and cytosol, and AhR protein was subsequently immunoblotted. The results depicted in Figure X demonstrate that IS-trained monocytes exhibited a higher level of AhR protein in the nucleus compared to non-trained monocytes. Notably, the nuclear translocation of AhR was significantly attenuated in IS-trained monocytes treated with GNF351. These findings imply that the activation of AhR, facilitated by the binding of IS, persisted partially up to 6 days, indicating that IS-mediated degradation of AhR was not fully recovered even on day 6 after the induction of IS training. Consequently, we have replaced Figure 4A in the revised manuscript.

      Author response image 11.

      The activation of AhR, facilitated by IS binding, is persisted partially up to 6 days during induction of trained immunity. The lysate of IS-trained cells treated with or without GNF351, were separated into nuclear and cytosol fraction, followed by WB analysis for AhR protein (Left panel). Band intensity in immunoblots was quantified by densitometry (Right panel). β-actin was used as a normalization control. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (11) Do other protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, change the mRNA expression of ALOX5, ALOX5AP, and LTB4R1? In the absence of genetic studies, it is difficult to be certain of the ALOX5-related mechanism claimed by the authors.

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we investigated whether uremic toxins, specifically PBUTs such as PCS, HA, IAA, and KA, induce changes in the mRNA expression of ALOX5, ALOX5AP, and LTB4R1 in trained monocytes. Intriguingly, the examination revealed no discernible induction in the mRNA expression of these genes by PBUTs, with the exception of IS, as depicted in Author response image 12 of the letter. These findings once again underscore the implication of the AhR-ALOX5 pathway in the induction of trained immunity in monocytes by IS.

      Author response image 12.

      No obvious impact of PBUTs except IS on the expression of arachidonic acid pathway-related genes on 6 days after treatment with PBUTs. Purified monocytes were treated with several PBUTs including IS, PCS, HA, IAA, and KA for 24 hrs., following by 5-day resting period to induce trained immunity. The mRNA expression of ALOX5, ALOX5AP, and LTB4R1 were quantified using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (12) Fig.6 is based on the correlated expression of inflammatory genes or AA pathway genes. It does not clarify any mechanisms the authors claimed in the previous figures. 

      We express our sincere appreciation for the constructive criticism provided by the reviewer, and we have taken careful note of the points raised. In response to the reviewer's feedback, we adopted two distinct approaches utilizing samples obtained from ESRD patients and IS-trained mice. Initially, we investigated the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients presented in Figure 6E of the original manuscript. Despite the limited number of samples, our analysis revealed a nonsignificant correlation between IS concentration and ALOX5 expression; however, it demonstrated a positive trend (Author response image 13A). Subsequently, we examined the potential inhibitory effects of zileuton, an ALOX5 inhibitor, on the production of TNF-α and IL-6 in LPSstimulated splenic myeloid cells derived from IS-trained mice. Our findings indicate that zileuton significantly inhibits the production of TNF-α and IL-6 induced by LPS in splenic myeloid cells from IS-trained mice (Author response image 13B). These data were added in Figure 6N of the revised manuscript (Line 350-354 of page 16 in the revised manuscript).

      Author response image 13.

      Assessment of the correlation between ALOX5 and the concentration of IS in ESRD patients, and investigation of ALOX5 effects in mouse splenic myeloid cells in IS-trained mice. A. Examination of the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients. B. C57BL/6 mice were administered daily injections of 200 mg/kg IS for 5 days, followed by a resting period of another 5 days. Subsequently, IS-trained mice were sacrificed, and spleens were mechanically dissociated. Isolated splenic myeloid cells were subjected to ex vivo treatment with LPS (10 ng/ml), along with zileuton (100 µM). The levels of TNF-α and IL-6 in the supernatants were quantified using ELISA. The graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test between zileuton treatment group and no-treatment group.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor corrections to the figures

      (1) No indicators for the control group in Fig. 1B.

      We thank you for the reviewer’s comment. According to the reviewer’s comment, the control group was indicated with (-).

      (2) The same paper is listed twice in the references section. (No. 19 and 28)

      We thank you for the reviewer’s comment. We deleted the reference No. 28.

      Reviewer #2 (Public Review):

      Manuscript entitled "Uremic toxin indoxyl sulfate (IS) induces trained immunity via the AhR-dependent arachidonic acid pathway in ESRD" presented some interesting findings. The manuscript strengths included use of H3K4me3-CHIP-Seq, AhR antagonist, IS treated cell RNA-Seq, ALOX5 inhibitor, MTA inhibitor to determine the roles of IS-AhR in trained immunity related to ESRD inflammation and trained immunity.

      Thank you very much for your positive feedback.

      Reviewer #2 (Recommendations For The Authors):

      However, the manuscript needs to be improved by fixing the following concerns.

      There are concerns:

      (1) The experiments in Figs. 1G, 1H and 1I need to have AhR siRNA, and siRNA control to demonstrate that the results in uremic toxins-containing serum-treated experiments were related to IS;

      We extend our gratitude to the reviewer for their invaluable comment, acknowledging its significant relevance to our study. In accordance with the reviewer's suggestion, we endeavored to conduct additional experiments utilizing AhR siRNA to elucidate the direct impact of IS present in the serum of end-stage renal disease (ESRD) patients on the induction of IS-mediated trained immunity. 

      Regrettably, owing to limitations in the availability of monocytes post-siRNA transfection, we were unable to establish a direct relationship between the observed outcomes in experiments utilizing uremic toxins-containing serum and IS in AhR siRNA knockdown monocytes. However, treatment with GNF351, an AhR antagonist, resulted in the inhibition of TNF-α production in trained monocytes exposed to uremic toxins-containing serum (Author response image 14).

      In our previous studies, we have already reported that uremic serum-induced TNF-α production in human monocytes is dependent on the AhR pathway, using GNF351 (Kim HY et al. Indoxyl sulfate (IS)-mediated immune dysfunction provokes endothelial damage in patients with end-stage renal disease (ESRD). Sci Rep. 2017 Jun 8;7(1):3057). Additionally, we have provided evidence demonstrating an augmentation in the activity of the AhR pathway within monocytes derived from ESRD patients, indicative of a significant reduction in AhR protein levels (Kim HY et al. Indoxyl sulfate-induced TNF-α is regulated by crosstalk between the aryl hydrocarbon receptor, NF-κB, and SOCS2 in human macrophages. FASEB J. 2019 Oct;33(10):10844-10858). It is noteworthy that other major protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, failed to induce trained immunity in human monocytes (Supplementary Figure 1A in the revised manuscript). Nevertheless, knockdown of AhR via siRNA effectively impeded the induction of IS-mediated trained immunity in human monocytes (Figure 4E in the revised manuscript). 

      Taken collectively, our findings suggest a critical role for IS present in the serum of ESRD patients in the induction of trained immunity in human monocytes. 

      Author response image 14.

      Inhibition of uremic serum (US)-induced trained immunity by AhR antagonist, GNF351. Monocytes were pre-treated with or without GNF351 (AhR antagonist; 10 µM) for 1 hour, followed by treatment with pooled normal serum (NS) or uremic serum (US) at a concentration of 30% (v/v) for 24 hours. After a resting period of 5 days, cells were stimulated with LPS for 24 hours. The production of TNF-α and IL-6 in the supernatants was quantified using ELISA. The data presented are derived from three independent experiments utilizing samples from different donors.

      (2) Fig. 3 needs to be moved as Fig. 2

      We express appreciation for the constructive suggestion provided by the reviewer. In response to the reviewer's comment, the sequence of Figure 3 and Figure 2 was adjusted in the revised manuscript.

      (3, 4) The connection between bioenergetic metabolism pathways and H3K4me3 was missing; The connection between bioenergetic metabolism pathways and ALOX5 was missing;

      We appreciate the reviewer’s constructive criticism and fully understood the reviewer's points. In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to elucidate the interrelation between bioenergetic metabolism and H3K4me3 and between bioenergetic metabolism and ALOX5. Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following treatment with zileuton, an inhibitor of ALOX5. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 15). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript).

      Author response image 15.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with ziluton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      (5) It was unclear whether histone acetylations such as H3K27acetylation and H3K14 acetylation are involved in IS-induced epigenetic reprogramming or IS-induced trained immunity is highly histone methylation-specific.

      We appreciate the constructive comment provided by the reviewer. As highlighted by the reviewer, alterations in epigenetic histone markers, specifically H3K4me3 or H3K27ac, have been recognized as the underlying molecular mechanism in trained immunity. Due to limitations in the availability of trained cells, this study primarily focused on histone methylation. In response to the reviewer's inquiry, we briefly investigated the impact of histone acetylation using C646, a histone acetyltransferase inhibitor, on IS-induced trained immunity (Author response image 16). Our experiments revealed that C646 treatment effectively hinders the production of TNF-α and IL-6 by IS-trained monocytes in response to LPS stimulation, comparable to the effects observed with MTA (5’methylthioadenosine), a non-selective methyltransferase inhibitor. This suggests that histone acetylation also contributes to the epigenetic modifications associated with IS-induced trained immunity. We sincerely appreciate the valuable input from the reviewer.

      Author response image 16.

      The role of histone acetylation in epigenetic modifications in IS-induced trained immunity. Monocytes were pretreated with MTA (methylthioadenosine, methyltransferase inhibitor) or C646 (histone acetyltransferase p300 inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, trained cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      Reviewer #3 (Public Review):

      The manuscript entitled, "Uremic toxin indoxyl sulfate induces trained immunity via the AhRdependent arachidonic acid pathway in ESRD" demonstrates that indoxyl sulfate (IS) induces trained immunity in monocytes via epigenetic and metabolic reprogramming, resulting in augmented cytokine production. The authors conducted well-designed experiments to show that the aryl hydrocarbon receptor (AhR) contributes to IS-trained immunity by enhancing the expression of arachidonic acid (AA) metabolism-related genes such as arachidonate 5-lipoxygenase (ALOX5) and ALOX5 activating protein (ALOX5AP). Overall, this is a very interesting study that highlights that IS mediated trained immunity may have deleterious outcomes in augmented immune responses to the secondary insult in ESRD. Key findings would help to understand accelerated inflammation in CKD or RSRD.

      We greatly appreciate your positive feedback.

      Reviewer #3 (Recommendations for The Authors):

      This reviewer, however, has the following concerns.

      Major comments:

      (1) Figure 1B: IS is known to induce the expression of TNF-a and IL-6. This reviewer wonders why these molecules were not detected in the IS (+) LPS (-) condition.

      We appreciate the constructive comment provided by the reviewer. In our prior investigation, it was observed that the expression of TNF-α and IL-6 was induced 24 hours after IS treatment in human monocytes and macrophages (Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). In adherence to the trained immunity protocol, the medium was replaced at the 24 hours post-IS treatment to eliminate IS, with a subsequent change after a 5-day resting period. Probably, TNF-α and IL-6 are accumulated and detected in the IS (+) LPS (-) culture supernatant if the media was not changed at these specific time points. Our primary objective, however, was to ascertain the role of IS in the induction of trained immunity, prompting an investigation into whether IS contributes to an increase in the production of TNF-α and IL-6 in response to LPS stimulation as a secondary insult. 

      (2) 1' stimulus is IS followed by 2' stimulus LPS/Pam3. It would be interesting to know what the immune profile is when other uremic toxin is used for secondary insult, this would be more relevant in clinical context of ESRD.

      The reviewer's insightful comment is greatly appreciated. To address their feedback, IStrained macrophages were subjected to additional stimulation using protein-bound uremic toxins (PBUTs) as a secondary challenge. As illustrated in Letter figure 17, the examined uremic toxins, namely p-cresyl sulfate (PCS), Hippuric acid (HA), Indole 3-acetic acid (IAA), and kynurenic acid (KA), failed to elicit the production of proinflammatory cytokines, specifically TNF-α and IL-6, by IS-trained monocytes.

      Author response image 17.

      No obvious effect of protein-bound uremic toxin (PBUTs) as secondary insults on the production of proinflammatory cytokines in IS-trained monocytes. IS-trained monocytes were re-stimulated with several PBUTs, such as IS (1 mM), PCS (1 mM), HA (2 mM), IAA. (0.5 mM), and KA (0.5 mM) as a secondary challenge for 24 hrs. TNF-α and IL-6 in supernatants were quantified by ELISA. The data from two independent experiments with different donors were shown. ND indicates ‘not detected’.

      (3) The authors need to explain a rationale why RNA and protein data used different markers.

      We appreciate the constructive input provided by the reviewer. Given that TNF-α and IL6 represent prototypical cytokines synthesized by trained monocytes in humans, we conducted a comprehensive analysis of their mRNA and protein levels. In human macrophages, the release of active IL-1β necessitates a second priming event, such as the presence of ATP. Consequently, we posited that assessing the mRNA levels of IL-1β would suffice to demonstrate the induction of trained immunity in our experimental protocol. Nevertheless, in response to the reviewer's comment, we proceeded to assess the protein levels of IL-1β, IL-10, and MCP-1 as illustrated in Author response image 189. These data have been incorporated into the revised manuscript as supplementary Figure 1E. 

      Author response image 18.

      Modulation of cytokine levels in IS-trained macrophages in response to secondary stimulation with LPS. Human monocytes were stimulated with the IS for 24 hr, followed by resting period for 5 days. On day 6, the cells were re-stimulated with LPS for 24 hr. The levels of each cytokine in the supernatants were quantified using ELISA. Bar graphs show the mean ± SEM. ** = p < 0.01 and ***= p < 0.001 by two-tailed paired t-test.

      (4) Epigenetic modification primarily involves histone modification and DNA methylation. The authors presented convincing data on histone modification (Figure 2), but did not provide any insights in the promoter DNA methylation status.

      We express our gratitude to the reviewer for providing valuable comments, which highlight a crucial aspect of our study. Despite the well-established primary role of DNA methylation in epigenetic modifications, recent suggestions propose that histone modifications, particularly H3K4me3 or H3K27ac, play a predominant role in the induction of trained immunity. In this context, our primary inquiry was focused on determining whether IS, as an endogenous insult, induces trained immunity in monocytes, and if so, whether IS-trained immunity is mediated through metabolic and epigenetic modifications - recognized as the major mechanisms underlying the generation of trained immunity. It is imperative to note that our study's primary objective did not encompass the identification of various epigenetic changes. In response to the reviewer's inquiry, we conducted a brief examination of the impact of DNA methylation using ZdCyd (5-aza-2’-deoxycytidine), a DNA methylation inhibitor, on IS-induced trained immunity. Our experimental findings indicate that ZdCyd treatment exerts no discernible effect on the production of TNF-α and IL-6 by IS-trained monocytes upon stimulation with LPS, as illustrated in Author response image 19. However, a recent study has shed light on the role of DNA methylation in BCG vaccine-induced trained immunity in human monocytes (Bannister S et al. Neonatal BCG vaccination is associated with a long-term DNA methylation signature in circulating monocytes. Sci Adv. 2022 Aug 5;8(31):eabn4002). Consequently, further investigations utilizing DNA methylation sequencing are warranted to elucidate whether DNA methylation is implicated in the induction of IS-trained immunity.

      Author response image 19.

      The effect of DNA methylation on IS-induced trained immunity. Monocytes were pretreated with ZdCyd (5-aza-2’-deoxycytidine, DNA methylation inhibitor), followed by treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by

      ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

                     

      (5) Metabolic rewiring in trained immunity cells undergo metabolic changes which involved intertwined pathways of glucose and cholesterol metabolism. The authors presented nice data on glucose pathway (Figure 3) but failed to show any changes related to cholesterol metabolism.

      We express our gratitude to the reviewer for providing valuable comments, which underscore a noteworthy observation. In the current investigation, our primary emphasis has been on glycolytic reprogramming, recognized as a principal mechanism for inducing trained immunity in monocytes. This focus stems from preliminary experiments wherein Fluvastatin, a cholesterol synthesis inhibitor, demonstrated no discernible impact on TNF-α production by IS-trained monocytes, as illustrated in Author response image 20. Intriguingly, Fluvastatin treatment exhibited a partial inhibitory effect on the production of IL-6 by IS-trained monocytes. Subsequent investigations are imperative to elucidate the role of cholesterol metabolism in the induction of IS-trained immunity.

      Author response image 20.

      The effect of cholesterol metabolism on IS-induced trained immunity. Monocytes were pretreated with Fluvastatin (cholesterol synthesis inhibitor, HMG-CoA reductase inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      (6) Trained immunity involves neutrophils in addition to monocyte/macrophages. It is evident from the RNAseq data that neutrophil degranulation (Figure 5B) is the top enriched pathway. This reviewer wonders why the authors did not perform any assays on neutrophils.

      We appreciate the reviewer for valuable comment. IS represents a major uremic toxin that accumulates in the serum of patients with chronic kidney disease (CKD), correlating with CKD progression and the onset of CKD-related complications, including cardiovascular diseases (CVD). Our prior investigations have demonstrated that IS promotes the production of TNF-α and IL-1β by human monocytes and macrophages. Additionally, macrophages pre-treated with IS exhibit a significant augmentation in TNF-α production when exposed to a low dose of lipopolysaccharide (LPS). Considering the pivotal role of proinflammatory macrophages and TNF-α, a principal cardiotoxic cytokine, in CVD pathogenesis, our focus in this study has primarily focused on elucidating the trained immunity of monocytes/macrophages. Consequently, all experiments were meticulously conducted using highly purified monocytes and monocytederived macrophages derived from both healthy controls and end-stage renal disease (ESRD) patients. The reviewer's observation regarding the potential involvement of neutrophils in trained immunity has been duly noted. Subsequent investigations will be imperative to explore the conceivable role of IS-trained neutrophils in the pathogenesis of CVD. Once again, we appreciate the reviewer for their valuable comment.

      (7) Figure 5C (GSEA plots): This reviewer is not sure if one can present the plots assigned with groups (eg. IS(T) vs Control). More details are required in the Methods related to this.

      We apologize for any ambiguity resulting from the previously unclear description of methods concerning Gene Set Enrichment Analysis (GSEA) plots. To provide clarification, additional details pertaining to this aspect have been explained upon in the revised manuscript's Methods section. 

      (8) In vivo data (Figure 6 I-M): Instead of serum profile and whole set of spleen myeloid cells, it would be interesting to see changes of markers on peritoneal macrophages or bone marrow-derived macrophages since the in vitro findings are on monocyte-derived macrophages.

      We appreciate comment and the insightful suggestion provided by the reviewer. In response to the reviewer's feedback, we conducted additional in vivo experiments to examine the production of TNF-α and IL-6 in bone marrow-derived macrophages (BMDMs) derived from IStrained mice. Upon LPS stimulation, we observed an increase in the production of TNF-α and IL-6 in spleen myeloid cells from IS-trained mice. However, no such increase in these cytokines was noted in BMDMs derived from the same mice (Author response image 22, A and B). In fact, we already observed that that the expression of ALOX5 was not elevated in BM cells derived from IS-trained mice presented in Figure 6L and M of the original manuscript (Author response image 22C). 

      Recent studies have indicated that trained immunity can be induced in circulating immune cells, such as monocytes or resident macrophages (peripheral trained immunity), as well as in hematopoietic stem and progenitor cells (HSPCs) within the bone marrow (central trained immunity) (Kaufmann E et al. BCG Educates Hematopoietic Stem Cells to Generate Protective Innate Immunity against Tuberculosis. Cell. 2018 Jan 11;172(1-2):176-190.e19; Riksen NP et al. Trained immunity in atherosclerotic cardiovascular disease. Nat Rev Cardiol. 2023 Dec;20(12):799-811). It is plausible that central trained immunity in BM progenitor cells may not be elicited in our mouse model, which is relatively acute in nature. Further investigations are warranted to explore the role of IS in inducing central trained immunity, utilizing appropriate chronic disease models.

      We have included this additional data as supplementary figures in the revised manuscript (Suppl. Fig. 7, D and E, and line 355-362 of page 16 in the revised manuscript).

      Author response image 21.

      Absence of trained immunity in bone marrow derived macrophages (BMDMs) derived from IStrained mice. A-B, IS was intraperitoneally injected daily for 5 days, followed by training for another 5 days. Isolated BM progenitor cells and spleen myeloid cells were differentiated or treated with LPS for 24 hr. The supernatants were collected for ELISA. C, The level of ALOX5 protein in BM cells isolated from IS-trained or control mice was analyzed by western blot. The graph illustrates the band intensity quantified by densitometry. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by unpaired t-test.

      (9) Figure 7: There are no data on signaling pathway(s) that links IS and epigenetic changes, the authors therefore may want to add "?" to the proposed mechanism.

      We extend our sincere appreciation to the reviewer for providing valuable feedback. In light of the constructive comments provided by three reviewers, we have undertaken a series of additional experiments. These efforts have enabled us to propose a more elucidating schematic representation of the proposed mechanism, free of any ambiguous elements (Figure 7 in the revised manuscript). We are grateful for your insightful input.

      (10) Demographic data (Table S2): ESRD patients have co-morbidities including diabetes (33% of subjects), CAD (28%). How did the authors factor out the co-morbidities in the overall context of their findings?

      We express gratitude to the reviewer for providing valuable comments, particularly on a noteworthy and significant aspect. The investigation employed an End-Stage Renal Disease (ESRD) Cohort involving approximately 60 subjects undergoing maintenance hemodialysis at Severance Hospital in Seoul, Korea. The subset of participants subjected to analysis consisted of stable individuals who provided informed consent and had not undergone hospitalization for reasons related to infection or acute events within the preceding three months.

      (11) There are no data on the purity of IS.

      According to the reviewer's suggestion, we have included information regarding the purity (99%) of IS in the Methods section.

      (12) Figure 6L: Immunoblot on b-actin were merged. This reviewer wonders how the authors analyzed these blots. 

      We express gratitude for the constructive criticism provided by the reviewer, and we acknowledge and comprehend the concerns raised. In response to the reviewer's comments, a reanalysis of the ALOX5 expression level in Figure 6M was conducted, employing immunoblot analysis on β-actin, as depicted in Figure 6L, with a short exposure time (Author response image 22).

      Author response image 22.

      ALOX5 protein exhibited an elevation in splenic myeloid cells obtained from IS-trained mice.

      (13) qPCR data throughout the manuscript have control group with no error bar. The authors may not set all controls arbitrarily equal to 1 (Example Figure 1H and I). Data should be normalized in a test standard way. The average of a single datapoint may be scaled to 1, but variation must remain within the control groups.

      We express gratitude to the reviewer for their valuable feedback, acknowledging a comprehensive understanding of their perspectives. Our qPCR assays predominantly investigated the impact of various treatments on the expression of specific target genes (e.g., TNF-α, IL-6, Alox5) within monocytes/macrophages obtained from the same donors.

      Subsequently, normalization of gene expression levels occurred relative to ACTINB expression, followed by relative fold-increase determination using the comparative CT method (ΔΔCT).

      Statistical significance was assessed through a two-tailed paired analysis in these instances. Additionally, a substantial portion of the qPCR data was validated at the protein level through ELISA and immunoblotting techniques.

      Minor Comments:

      (1) Molecular weight markers are missing in immunoblots throughout the manuscript.

      According to the reviewer's comment, molecular weight markers are added into immunoblots

      (2)  ESRD should be spelled out in the title.

      According to the reviewer's comment, we spelled out ESRD in the title.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major recommendations

      (1) In lines 42-44 (abstract), the authors state that "ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome". Similar conclusions are restated in lines 138-140. Based on the data presented, it is evident that ASARs localization on chromatin is dependent on hnRNPs. However, there is insufficient evidence to conclude that ASARs cause the assembly of hnRNP complexes or that these hnRNP complexes are directly responsible for the regulation of chromosome replication. Please revise your claims.

      We have modified the text as follows: “Our results further demonstrate the role that ASARs play during the temporal order of genome-wide replication, and we propose that ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome.”

      (2) In the analysis in Figure 1C- F, it is unclear why XIST is used as a comparison to ASAR6-141. A more meaningful control would be to show that hnRNPs preferentially bind ASAR6-141 relative to all expressed transcripts. Also, some panels are missing the y-axis label.

      We have genetically validated 8 different ASAR genes for their role in controlling chromosome-wide replication timing. The only other gene known to control chromosome-wide replication timing is XIST, which also encodes a chromosome-associated lncRNA. Our analysis of publicly available eCLIP data (and previous literature on XIST-binding proteins) showed substantial overlap between RBPs that associate with ASARs and XIST. Hence, we anticipated that at least some RBP knockdowns would affect both lncRNAs, despite their contrasting functions. In addition, we routinely use XIST RNA as a positive control in RNA FISH assays, as the XIST RNA FISH protocol represents a robust and well validated chromosomal RNA FISH procedure.

      y-axis labels have been added to Figure 1.

      (3) In Figure 2K&L, it would be beneficial to quantify and normalize the BrdU incorporation, as ectopic integration of the sense 7kb region appears to result in overall higher BrdU incorporation in all chromosomes, not just chromosome 5.

      There are two main aspects of the BrdU incorporation assay that we use: 1) The BrdU incorporation banding pattern on each chromosome is unique to that chromosome, and the banding pattern is also representative of the time during S phase when the BrdU incorporation occurred, i.e. we detect a different banding pattern if BrdU is incorporated in early S phase versus late S phase. 2) The amount of BrdU incorporation can be used to measure the synchrony between chromosome homologs, but only within the same cell. Thus, we generate a ratio of BrdU incorporation in chromosome homologs in individual cells, then compare the ratio of incorporation into each chromosome pair in multiple cells (see Figure 2B-E). The overall BrdU incorporation into the chromosomes of different cells is quite variable; however, the banding pattern and ratio of BrdU incorporation in chromosome homologs in individual cells is comparable, unless we have disrupted or ectopically integrated an ASAR. Given the variability in overall BrdU incorporation detected between different cells in the population this is not a useful readout for measuring synchronous versus asynchronous replication between chromosome homologs.

      (4) hnRNP protein can regulate multiple aspects of RNA processing other than chromatin retention. Hence, it would be beneficial to rule out an alternative hypothesis as to what the hnRNP knockdowns do to ASAR6-131? For example, assessing changes in RNA levels or splicing upon knockdown of hnRNPs using qPCR?

      We agree that direct roles for any of the hnRNP/RBPs that are critical for ASAR RNA localization and replication timing have not been established. However, our findings combined with the observation that cells depleted of HNRNPU show reduced origin licensing in G1, and show reduced origin activation frequency during S phase (PMID: 34888666), supports a role for HNRNPU, either directly or indirectly, in DNA replication. Furthermore, we also found that depletion of the DNA replication fork remodeler HLTF or the deubiquitinase UCHL5 also results in mis-localization of ASAR RNAs, and results in asynchronous replication of every autosome pair, indicating that ASAR RNA mis-localization and asynchronous replication are not simply a phenotype associated with hnRNP depletions. A full mechanistic understanding of the role that ASAR RNAs play in combination with this relatively large and diverse set of hnRNP/RBPs will require a better understanding of the direct roles that each protein, and any higher order complexes that contain these proteins, play in regulating DNA synthesis, splicing, transcription, chromatin structure and/or ASAR RNA localization.

      (5) Both the disruption and ectopic expression of the 7kb region result in delayed chromosome replication. Would one not expect there to be opposing effects on replication timing? Please discuss.

      One puzzling set of observations is that loss of function mutations and gain of function mutations of ASAR genes result in a similar delayed replication timing and delayed mitotic condensation phenotype. We have detected delayed replication timing in human cells following genetic knockouts (loss of function) of eight different ASAR genes located on 5 different autosomes. We have also detected delayed replication timing on mouse chromosomes expressing transgenes (gain of function) from three different ASAR genes (ASAR6, ASAR6-141, and ASAR15). The ASAR transgenes ranged in size from an ~180kb BAC, to an ~3kb PCR product. One possible explanation for these observations is that ectopic integration of ASAR transgenes function in a dominant negative manner by interfering with the endogenous “ASARs” on the integrated chromosomes. Consistent with this possibility is that we recently identified ASAR candidate genes on every human autosome (PMC9588035). Our favored model is that expression of ASAR transgenes integrated into mouse chromosomes disrupts the function of endogenous ASARs by "out-competing" them for shared RBPs. We also point out that a similar ectopic integration assay, using Xist transgenes, has been an informative assay for characterization of Xist functions, including the ability to delay replication timing and induce gene silencing on autosomes (reviewed in PMID:19898525). One intriguing observation (yet largely ignored by the X inactivation field) is that deletion of the Xist gene on either the active or inactive X chromosomes in somatic cells results in delayed replication timing of the X chromosomes (PMC1667074; PMC1456779). Thus, both loss of function and gain of function mutations of Xist result in a similar delayed replication timing phenotype. Given these parallels between Xist and ASAR gene mutation phenotypes we were curious to test the consequences of ASAR gain of function on the inactive X chromosome. In this manuscript, we integrated the ~7kb ASAR6-141 transgene into the inactive X chromosome, and detected a delayed replication timing phenotype on the integrated X chromosome. We also detected an association between Xist and ASAR RNAs using RNA FISH in interphase cells (Figure 4A and 4B), which supports the observations that ASAR RNAs and XIST RNA are bound by a partially overlapping set of hnRNP/RBPs (Figure 1D-F), and is consistent with the model that ASAR transgenes disrupt function by competition for shared RBPs. Dissecting the roles that the hnRNP/RBPs that interact with both ASAR and XIST RNAs will undoubtably give important insights into both XIST and ASAR function, and how these poorly understood chromosomal phenotypes are generated.

      Minor recommendations

      (1) In Figure 1G, it would be informative to show where the LINE-1 element within ASAR6-141 is located to get a sense of what hnRNP proteins bind to it.

      There are numerous LINE-1 elements within the ASAR6-141 gene. The ~7kb RBPD does not contain LINE-1 sequences. Therefore, we did not detect significant hnRNP/RBP eCLIP peaks within LINE-1 sequences.

      (2) The rationale for ectopic integration of the 7kb region into the inactive X-chromosome is unclear. Is there something unique about the replication of the inactive X or were you interested in seeing whether the 7kb region could escape X-inactivation?

      Given the parallels between Xist and ASAR gene mutation phenotypes, i.e. loss of function and gain of function result in delayed replication timing (see above), we were curious to test the consequences of ASAR gene gain of function on the inactive X chromosome. One possibility was reversal of X inactivation and a shift to earlier replication timing. However, we detected delayed replication timing on the inactive X, and an enhanced XIST RNA FISH signal that overlapped with the ASAR RNA. This speaks to the comment of Reviewer 2 questioning: "Is it possible that integration might alter Xist expression confounding this interpretation? ". The enhanced XIST RNA FISH signal suggests that the delayed replication of the inactive X is not due to reduced expression of XIST RNA.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this potentially useful study, the authors attempt to use comparative meta-analysis to advance our understanding of life history evolution. Unfortunately, both the meta-analysis and the theoretical model is inadequate and proper statistical and mechanistic descriptions of the simulations are lacking. Specifically, the interpretation overlooks the effect of well-characterised complexities in the relationship between clutch size and fitness in birds.

      Public Reviews:

      We would like to thank the reviewers for their helpful comments, which have been considered carefully and have been valuable in progressing our manuscript. The following bullet points summarise the key points and our responses, though our detailed responses to specific comments can be found below:<br /> - Two reviewers commented that our data was not made available. Our data was provided upon submission and during the review process, however was not made accessible to the reviewers. Our data and code are available at https://doi.org/10.5061/dryad.q83bk3jnk.

      - The reviewers have highlighted that some of our methodology was unclear and we have added all the requested detail to ensure our methods can be easily understood.

      - The reviewers highlight the importance of our conclusions, but also suggest some interpretations might be missing and/or are incomplete. To make clear how we objectively interpreted our data and the wider consequences for life-history theory we provide a decision tree (Figure 5). This figure makes clear where we think the boundaries are in our interpretation and how multiple lines of evidence converge to the same conclusions.

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We further show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to be larger than their original brood (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate that the overall survival effect of a change in reproductive effort is close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study). Please also note that the Santos & Nakagawa study was conducted over 10 years ago. This means we added additional data (L364-365). Furthermore, meta-analyses are an evolving practice and we also corrected and improved on the overall analysis approach (e.g. L358-359 and L 393-397, and see detailed SI).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival – a key theme in life history and the biology of ageing – and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than considering that the original hypothesis could be false or inflated in importance. We do not consider questioning the premise of the data over questioning a favoured hypothesis to necessarily be the best scientific approach here. In many places in our manuscript, we question and address, at length, the underlying data and their interpretation (L116-117, L165-167, 202-204 and L277-282). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival, while being aware that other trade-offs could counter-balance or explain our findings (discussed on L208-210 & L301-316). Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, of potential trade-offs, there are endless possibilities of where a trade-off might operate between traits. We purposefully focus on the one well-studied and most commonly invoked trade-off. We clearly acknowledge, though, that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have shown just that (L314-316).

      So whilst we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a general trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      What we do appreciate from the reviewer’s comment is that the interpretation of our findings is complex. Even though our in-text explanation includes the caveats the reviewer refers to, and are discussed at length, their inter-relationships are hard to appreciate from a text format. To improve this presentation and for ease of the reader, we have added a decision tree (Figure 5) which represents the logical flow from the hypothesis being tested through to what overall conclusion can be drawn from our results. We believe this clarifies what conclusions can be drawn from our results. We emphasise again that the theory that trade-offs between reproductive effort and parental survival being the major driver of variation in offspring production was not supported though is the one that practitioners in the field would be most likely to invoke, and our result is important for this reason.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations (L107-123, Figure 1, Table 1). Note, however, that much theory is built on the immediate costs of reproduction and, as such, these costs are likely overinterpreted, meaning that our overall interpretation still holds, i.e. “parental survival trade-off is not the major determinative trade-off in life history within-species” (Figure 5).

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 466-468, where we explicitly state that this is lifetime enlargement. Of course, such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of the annual costs incurred. Note that we have now included specific discussion of this study in response to the reviewer (L265-269).

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We have added additional detail to the methodology section (see “Study sourcing & inclusion criteria” and “Extracting effect sizes”) in our revised manuscript. Note, that our data and code was not shared with the reviewers despite us supplying this upon submission and again during the review process, which would have explained a lot more of the detail required.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species’ mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa. Arguably, the approach by Santos & Nakagawa is worse, as they dichotomise effort as increased or decreased, factorise their output and thereby inflate their number of outcomes, of which only 1 cell of 4 categories is significant (for males and females, increased and decreased brood size). The proof is in the pudding as well, as our results clearly demonstrate that the magnitude of the manipulation is a key factor driving the results, i.e. one offspring for a seabird is a larger proportion of care (and fitness) than one offspring for a passerine. Such insights were not achieved by Santos & Nakagawa’s method and, again, did not allow a direct quantitative comparison between quality (correlational) and experimental (brood size manipulation, i.e. “trade-off”) effects, which forms a central part of our argumentation (Figure 5). 

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets the range of added chicks required to estimate a non-linear relationship was not available. The question also remains of what the shape of such a non-linear relationship should be and is hard to determine a priori. There is also a real risk when fitting non-linear terms that they are spurious and overinterpreted, as they often present a better fit (denoting one df is not sufficient especially when slopes vary). We have added this detail to our discussion.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately tailored the selection of studies to match the manipulation studies (L367-369). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the equally important observational component of our analysis and thereby fails to acknowledge one of the key questions being addressed in this study. Note that in our revised version we have edited the phylogenetic tree to indicate for which species we have both types of information, which highlights our approach to selecting observational data (Figure 3).

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 336–339, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds that have larger clutch sizes also lived longer, and we suggest that this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size. We have added Figure 5 to our manuscript to help the reader better understand what questions we can answer with our study and what conclusions we can draw from our results.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, and have explained our methods in terms that are accessible to a wider audience. Note, however, that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We have added the model formula to the model output tables.

      For the simulation, we simply simulated the resulting effects. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand why the reviewer feels the simulations were not explained thoroughly. We have revised our methods section and added details which we believe make our methodology more clear without needing to consult the supplemental material. However, we have also added the equations used in the process of calculating our simulated data to the Supplementary Information for readers who wish to have this information in equation form.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. Throughout the manuscript we have refined our terminology and indicated where we are referring to the individual level or the population level. The inclusion of our new Figure 5 (decision tree) should also help in this context, as it is clear on which level we base our interpretation and conclusions on.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      Thank you for identifying this sentence for which the writing was ambiguous, our apologies. We have now rewritten this and included additional explanation. L282-290: ‘The effect on parental annual survival of having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation, and quantitatively similar. Parents with naturally larger clutches are thus expected to live longer and this counterbalances the “cost of reproduction” when their brood size is experimentally manipulated. It is, therefore, possible that quality effects mask trade-offs. Furthermore, it could be possible that individuals that lay larger clutches have smaller costs of reproduction, i.e. would respond less in terms of annual survival to a brood size manipulation, but with our current dataset we cannot address this hypothesis (Figure 5).’

      We would also like to thank the reviewer for bringing to our attention the lack of clarity about the details of our methodology. We have added details to our methodology (see “Extracting effect sizes” section) to address this (see highlighted sections). For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. We have added detail to our methodology section so our models and rationale are more clear. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: (1) overall quality effects connecting reproduction and parental survival are present, (2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is correct, however, that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L74-76, L95-98 & L286-289), but we do not quantify this, as it is dependent on the unknown relationship between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there are some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation (now included L287-290). Such information is, however, not available for all studies and, although we explored the possibility of analysing this, currently this is not possible with adequate confidence and there is the possible complexity of non-linear effects. We have added this rationale in our revision (L259-265).

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There is a longstanding unexplained difference in temperate (seasonal) and tropical reproductive strategies. Most of our data come from seasonal breeders, however. Although there is some variation in second brooding and such, these species mostly only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show that quality is important and that the effect we find in experimental studies is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within species. We do agree that there is a lot more work that can be done in this area. We hope we are contributing to the field, by questioning this central trade-off. We have incorporated some of the reviewers suggestions in the revision (L309-315). We have added Figure 5 to make clear where we are able to reach solid conclusions and the evidence on which these are based as clearly as possible in an easily accessible format.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we have added to our discussion of how our results play into the importance of accounting for among-individual heterogeneity (L252-256).

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings that we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper consideration and we have added detail accordingly to our revised discussion.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there is not one. First, and importantly, we do find a trade-off but show this is only incurred when individuals produce a clutch beyond their optimal level. Second, we also state on lines 322-326 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. We benefit from our unique analysis allowing for a quantitative fitness estimate from the effect size on annual survival (as this is expressed on a per-egg basis). This allowed us to ask whether this quantitative effect size can alone explain why reproduction is constrained, and we evaluate this using simulations. From these simulations we find that this effect size is too small to explain the constraint, so something else must be going on, and we do spend a considerable amount of text discussing the possible explanations (L202-215). Note that the possibly most parsimonious conclusion here is that costs of reproduction are not there, or simply small, so we also give that explanation some thought (L221-224 and L315-331).

      We are disappointed by the suggestion that we have dismissed complicating factors that could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We have added further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory, including the addition of a decision tree (Figure 5).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L316-320). We would also like to highlight that , in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L317-318 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So, without a priori knowledge on this, we kept our model simple to test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude that it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort probably does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L317-320). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species. We believe the addition of Figure 5 to our reviewed manuscript also makes this more evident.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies, however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here. The question we pose is “Why don’t all birds produce a clutch size at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained? As the reviewer outlines, there is extensive variability, with some birds laying half of what other birds lay.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Title: while the costs of reproduction are possibly important in shaping optimal clutch size, it is not clear what you can about it given that you do not consider clutch / brood size effects on fitness prospects of the offspring.

      We have expanded on our discussion of how some costs may be absorbed by the offspring themselves. However, a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. We have focussed on the relationship between reproductive effort and survival because it is given the most weight in the field in terms of driving intra-specific variation in clutch size. We have altered our title to show we focus on the survival costs specifically: “The optimal clutch size revisited: separating individual quality from the parental survival costs of reproduction”.

      (2) L.11-12: I agree that this is true for birds, but this is phrased more generally here. Are you sure that that is justified?

      The trade-off between survival and reproductive effort has largely been tested experimentally through brood manipulations in birds as this provides a good system in which to test the costs and benefits of increasing parental effort. The work in this area has provided theory beyond just passerine birds, which are the most commonly manipulated group, to across-taxa theories. We are unaware of any study/studies that provide evidence that the reproduction/survival trade-off is generalisable across multiple species in any taxa. As such, we do believe this sentence is justified. An example is the lack of a consistent negative genetic correlation in populations of fruitflies, for example, that has also been hailed as a lack-of-cost paradigm. Furthermore, some mutants that live longer do so without a cost on reproduction.

      (3) L.13-14: Not sure what you mean with this sentence - too much info lacking.

      We have added some detail to this sentence.

      (4) L.14: it is slightly awkward to say 'parental investment and survival' because it is the survival effect that is usually referred to as the 'investment'. Perhaps what you want to say is 'parental effort and survival'?

      We have replaced “parental investment” with “reproductive effort”

      (5) L.15: you can omit 'caused'. Compared to control treatment or to reduced broods? Why not mention effects or lack thereof of brood reduction? And it would be good to also mention here whether effects were similar in the sexes.

      Please see our methodology where we state that we use clutch size as a continuous variable (we do not compare to control or reduced but include the absolute value of offspring in a logistic regression). The effects of a brood reduction are drawn from the same regression and so are opposite. Though we appreciate the detail here is lacking to fully comprehend our study, we would like to highlight this is the abstract and details are provided in the main text.

      (6) L. 15: I am not sure why you write 'however', as the finding that experimental and natural variation have opposite effects is in complete agreement with what is generally reported in the literature and will therefore surprise no one that is aware of the literature.

      We use “however” to highlight the change in direction of the effect size from the results in the previous sentence. We also believe that ours ise the first study that provides a quantitative estimate of this effect and that previous work is largely theoretical. The reviewer states that this is what is generally reported but it is not reported in all cases, as some relationships between reproductive effort and survival are negative (for the quality measurement, in correlational space, see Figure 1).

      (7) L.16: saying 'opposite to the effect of phenotypic quality' seems difficult to justify, as clutch size cannot be equated with phenotypic quality. Perhaps simply say 'natural variation in clutch size'? If that is what you are referring to.

      Please note we are referring to effect sizes here –- that is, the survival effect of a change in clutch size. By phenotypic quality we are referring to the fact that we find higher parental survival when natural clutch sizes are higher. It is not the case that we refer to quality only as having a higher clutch size. This is explicitly stated in the sentence you refer to. We have changed “effect” to “effect size” to highlight this further.

      (8) L.18: why do you refer to 'parental care' here? Brood size is not equivalent to parental care.

      Brood size manipulations are used to manipulate parental care. The effect on parental survival is expected to be incurred because of the increase in parental care. We have changed “parental care” to “reproductive effort” to reduce the number of terms we use in our manuscript.

      (9) L.18-19: suggest to tone down this claim, as this is no more than a meta-analytic confirmation of a view that is (in my view) generally accepted in the field. That does not mean it is not useful, just that it does not constitute any new insight.

      We are unaware of any other study which provides generalisable across-species evidence for opposite effects of quality and costs of reproduction. The work in this area is also largely theoretical and is yet to be supported experimemtally, especially in a quantitative fashion. It is surprising to us that the reviewer considers there to be general acceptance in a field, rather than being influenced by rigorous testing of hypotheses, made possible by meta-analysis, the current gold standard in our field.

      (10) L.21: what does 'parental effort' mean here? You seem to use brood size, parental care, parental effort, and parental investment interchangeably but these are different concepts. Daan et al (1990, Behaviour), which you already cite, provide a useful graph separating these concepts. Please adjust this throughout the manuscript, i.e. replace 'reproductive effort' with wording that reflect the actual variable you use.

      We have not used the phrase “parental effort” in this sentence. We agree these are different concepts but in this context are intertwined. For example, brood size is used to manipulate parental care as a result of increased parental effort. We do agree the manuscript would benefit from keeping terminology consistent throughout the manuscript and have adjusted this throughout.

      (11) L.23: perhaps add 'in birds' somewhere in this sentence? Some reference to the assumptions underlying this inference would also be useful. Two major assumptions being that birds adjusted their effort to the manipulation as they would have done had they opted for a larger brood size themselves, and that the costs of laying and incubating extra eggs can be ignored. And then there is the effect that laying extra eggs will usually delay the hatch date, which in many species reduces reproductive success.

      Though our study does exclusively use birds, birds have been used to test the survival/reproduction trade-off because they present a convenient system in which to experimentally test this. The conclusions from these studies have a broader application than in birds alone. We believe that although these details are important, they are not appropriate in the abstract of our paper.

      (12) L.26: how is this an explanation? It just repeats the finding.

      We intend to refer to all interpretations from all results presented in our manuscript. We have made this more clear by adjusting our writing.

      (13) L.27: I do not see this point. And 'reproductive output' is yet another concept, that can be linked to the other concepts in the abstract in different ways, making it rather opaque.

      We have changed “reproductive output” to “reproductive effort”.

      (14) L.33: here you are jumping from 'resources' to 'energetically' - it is not clear that energy is the only or main limiting resource, so why narrow this down to energy?

      We do not say energy is the only or main limiting resource. We simply highlight that reproduction is energetically demanding and so, intuitively, a trade-off with a highly energetically demanding process would be the focal place to observe a trade off. We have, though, replaced “energetically” with “resource”.

      (15) L.35-36: this is new to me - I am not aware of any such claims, and effects on the residual reproductive value could also arise through effects on future reproduction. The authors you cite did not work on birds, or (in their own study systems) presented results that as far as I remember warrant such a general statement.

      The trade-off between reproduction and survival is seminal to the disposable soma theory, proposed by Kirkwood. Though Kirkwood’s work was largely not focussed on birds, it had fundamental implications for the field of evolutionary ecology because of the generalisable nature of his proposed framework. In particular, it has had wide-reaching influence on how the biology of aging is interpreted. The readership of the journal here is broad, and our results have implications for that field too. The work of Kirkwood (many of the papers on this topic have over 2000 citations each) has been perhaps overly influential in many areas, so a link to how that work should be interpreted is highly relevant. If the reviewer is interested in this topic the following papers by one of the co-authors and others could be of interest, some of which we could not cite in the main manuscript due to space considerations:

      https://www.science.org/doi/pdf/10.1126/sciadv.aay3047

      https://agingcelljournal.org/Archive/Volume3/stochasticity_explains_non_genetic_inheritance_of_lifespan/

      https://pubmed.ncbi.nlm.nih.gov/21558242/

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.13444

      https://www.nature.com/articles/362305a0

      https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(12)00147-4

      https://www.cell.com/cell/pdf/S0092-8674(15)01488-9.pdf

      https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0562-z

      (16) L.42: this could be preceded with mentioning the limitations of observational data.

      We have added detail as to why brood manipulations are a good test for trade-offs and so this is now inherently implied.

      (17) L.42-43: why?

      We have added detail to this sentence.

      (18) L.45: do any of the references cited here really support this statement? I am certain that several do not - in these this statement is an assumption rather than something that is demonstrated. It may be useful to look at Kate Lessell's review on this that appeared in Etologia, I think in the 1990's. Mind however that 'reproductive effort' is operationally poorly defined for reproducing birds - provisioning rate is not necessarily a good measure of effort in so far as there are fitness costs.

      We have updated the references to support the sentence.

      (19) L.47: Given that you make this statement with respect to brood size manipulations in birds, it seems to me that the paper by Santos & Nakagawa is the only paper you should cite here. Given that you go on to analyze the same data it deserves to be discussed in more detail, for example to clarify what you aim to add to their analysis. What warrants repeating their analysis?

      Please first note that our dataset includes Santos & Nakagawa and additional studies, so it is not accurate to say we analyse the same data. Furthermore, we believe our study has implications beyond birds alone and so believe it is appropriate to cite the papers that do support our statement. We have added details to the methods to explicitly state what data is gathered from Santos & Nakagawa (it is only used to find the appropriate literature and data was re-extracted and re-analysed in a more appropriate way) and, separately, how we gathered the observational studies (see L352-381).

      (20) L.48: There are more possible explanations to this, which deserve to be discussed. For example, brood size manipulations may not have been that effective in manipulating reproductive effort - for example, effects on energy expenditure tend to be not terribly convincing. Secondly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Thirdly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      Please see our response to this comment in the public reviews.

      Out of interest and because the reviewer mentioned “energy expenditure” specifically: There are studies that show convincing effects of brood size manipulation on parental energy expenditure. We do agree that there are also studies that show ceilings in expenditure. We therefore disagree that they “tend to be not terribly convincing”. Just a few examples:

      https://academic.oup.com/beheco/article/10/5/598/222025 (Figure 2)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.12321 (Figure 1)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2000.00395.x (but ceiling at enlarged brood).

      (21) L.48, "or, alternatively, that individuals may differ in quality": how do you see that happening when brood size is manipulated, and hence 'quality' of different experimental categories can be assumed to be approximately equal? This point does apply to observational studies, so I assume that that is what you had in mind, but that distinction should be clear (also on line 54).

      We have made it more clear that we determine if there are quality effects separate to the costs of reproduction found using brood manipulation studies.

      (22) L.50: Drent & Daan, in their seminal paper on "The prudent parent" (1980, Ardea) were among the earliest to make this point and deserve to be cited here.

      We have added this citation

      (23) L.51, "relative importance": relative to what? Please be more specific.

      We have adjusted this sentence.

      (24) L.54: Vedder & Bouwhuis (2018, Oikos) go some way towards this point and should be explicitly mentioned with reference to the role of 'quality' effects on the association between reproductive output and survival.

      We have added this reference.

      (25) L.55: can you be more specific on what you want to do exactly? What you write here could be interpreted differently.

      We have added an explicit aim after this sentence to be more clear.

      (26) L.57: Here also a more specific wording would be useful. What does it mean exactly when you say you will distinguish between 'quality' and 'costs'?

      We have added detail to this sentence.

      (27) L.62: it should be clearer from the introduction that this is already well known, which will indirectly emphasize what you are adding to what we know already.

      We would argue this is not well known and has only been theorised but not shown empirically, as we do here.

      (28) L.62: you equate clutch size with 'quality' here - that needs to be spelled out.

      We refer to quality as the positive effect size of survival for a given clutch size, not clutch size alone. We appreciate this is not clear in this sentence and have reworded.

      (29) L.64: this looks like a serious misunderstanding to me, but in any case, these inferences should perhaps be left to the discussion (this also applies to later parts of this paragraph), when you have hopefully convinced readers of the claims you make on lines 62-63.

      We are unsure of what the reviewer is referring to as a misunderstanding. We have chosen this format for the introduction to highlight our results. If this is a problem for the editors we will change as required.

      (30) L.66: quantitative comparison of what?

      Comparison of species. We have changed the wording of this sentence

      (31) L.67-69: this should be in the methods.

      We have used a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (32) L.74-88: suggest to (re)move this entire paragraph, presenting inferences in such an uncritical manner before presenting the evidence is inappropriate in my view. I have therefore refrained from commenting on this paragraph.

      We have chosen a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (33) L.271, "must detail variation in the number of raised young": it is not sufficiently clear what this means - what does 'detail' mean in this context? And what does 'number of raised young' mean? The number hatched or raised to fledging?

      We have now made this clear.

      (34) L271, "must detail variation in the number of raised young": looking at table S4, it seems that on the basis of this criterion also brood size manipulation studies where details on the number of young manipulated were missing are excluded. I see little justification for this - surely these manipulations can for example be coded as for example having the average manipulation size in the meta-analysis data set, thereby contributing to tests of manipulation effects, but not to variation within the manipulation groups?

      We have done in part what the reviewer describes. We are specifically interested in the manipulation size, so we required this to compare effect sizes across species and categories, a key advance of our study and outlined in many places in our manuscript. Note, however, that we only need comparative differences, and have used clutch size metrics more generally to obtain a mean clutch size for a species, as well as SD where required. Please also note that our supplement details exactly why studies were excluded from our analysis, as is the preferred practice in a meta-analysis.

      (35) L.271, "referred to as clutch size": the point of this simplification is not clear to me why it is clearly confusing - why not refer to 'brood size' instead?

      Brood size and clutch size can be used interchangeably here because, in the observational studies, the individuals vary in the number of eggs produced, whereas for brood manipulations this obviously happens after hatching and brood is perhaps a more appropriate term, but we wanted to simplify the terminology used. However, we use clutch size throughout as the aim of our study is to determine why individuals differ in the number of offspring they produce, and so clutch size is the most appropriate term for that.

      (36) L.280: according to the specified inclusion criteria (lines 271/272) these studies should already be in the data set, so what does this mean exactly?

      Selection criteria refers to whether a given study should be kept for analysis or not. It does not refer to how studies were found. Please see lines 361-378 for details on how we found studies (additional details are also in the Supplementary Methods).

      (37) L.281: the use of 'quality' here is misleading - natural variation in clutch or brood size will have multiple causes, variation in phenotypic quality of the individuals and their environment (territories) is only one of the causes. Why not simply refer to what you are actually investigating: natural and experimental variation in brood size.

      We disagree, our study aims to separate quality effects from the costs of reproduction and we use observational studies to test for quality differences, though we make no inference about the mechanisms. We do not imply that the environment causes differences in quality, but that to directly compare observation and experimental groups, they should contain similar species. So, to be clear again, quality refers to the positive covariation of clutch size with survival. We feel that we explain this clearly in our study’s rationale and have also improved our writing in several sections on this to avoid any confusion (see responses to earlier comments by the three reviewers).

      (38) L.283, "in most cases": please be exact and say in xx out xx cases.

      We have added the number of studies for each category here.

      (39) L.283-285: presumably readers can see this directly in a table with the extracted data?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Though we do believe all readers should have access to this information if they wish and so is publicly available.

      (40) L.293: there does not seem to be a table that lists the included studies and effect sizes. It is not uncommon to find major errors in such tables when one is familiar with the literature, and absence of this information impedes a complete assessment of the manuscript.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      (41) L.293: from how many species?

      We have added this detail.

      (42) L.296, "longevity": this is a tricky concept, not usually reported in the studies you used, so please describe in detail what data you used.

      We have removed longevity as we did not use this data in our current version of the manuscript.

      (43) L. 298: again: where can I see this information?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers.

      (44) L. 304, "we used raw data": I assume that for the majority of papers the raw data were not available, so please explain how you dealt with this. Or perhaps this applies to a selection of the studies only? Perhaps the experimental studies?

      By raw data, we mean the absolute value of offspring in the nest. We have changed the wording of this sentence and added detail about whether the absolute value of offspring was not present for brood manipulation studies (L393-397).

      (45) L.304: When I remember correctly, Santos and Nakagawa examined effects of reducing and enlarging brood size separately, which is of importance because trade-off curves are unlikely to be linear and whether they are or not has major effects on the optimization process. But perhaps you tackled this in another way? I will read on.....

      You are correct that Santos & Nakagawa compared brood increases and reductions to control separately. Note that this only partially accounts non-linearity and it does not take into account the severity of the change in brood size. By using a logistic regression of absolute clutch size, as we have done, we are able to directly compare brood manipulations with experimental studies. Please see Supplementary Methods lines 11-12, where we have added additional detail as to why our approach is beneficial in this analysis.

      (46) L.319: what are you referring to exactly with "for each clutch size transformation"?

      We refer to the raw, standardised and proportional clutch size transformations. We have added detail here to be more clear.

      (47) L.319: is there a cost of survival? Perhaps you mean 'survival cost'? This would be appropriate for the experimental data, but not for the observational data, where the survival variation may be causally unrelated to the brood size variation, even if there is a correlation.

      We have changed “cost of survival” to “effect of parental survival”. We only intend to imply causality for the experimental studies. For observational studies we do not suggest that increasing clutch size is causal for increasing survival, only correlative (and hence we use the phrase “quality”).

      (48) L.320: please replace "parental effort" with something like 'experimental change in brood size'.

      We have changed “parental effort” to “reproductive effort”

      (49) L.321: due to failure of one or more eggs to hatch, and mortality very early in life, before brood sizes are manipulated, it is not likely that say an enlargement of brood size by 1 chick can be equated to the mean clutch size +1 egg / check. For example, in the Wytham great tit study, as re-analysed by Richard Pettifor, a 'brood size manipulation' of unmanipulated birds is approximately -1, being the number of eggs / chicks lost between laying and the time of brood size manipulation. Would this affect your comparisons?

      Though we agree these are important factors in determining what a clutch/brood size actually is for a given individual/pair, as this can vary from egg laying to fledging. We do not believe that accounting for this (if it was possible to do so) would significantly affect our conclusions, as observational studies are comparable in the fact that these birds would also likely see early life mortality of their offspring. It is also possibly the case that parents already factor in this loss, and so a brood manipulation still changes the parental care effort an individual has to incur.

      (50) L.332: instead of "adjusted" perhaps say 'mean centred'?

      We have implemented this suggestion.

      (51) L.345: this statement surprised me, but is difficult to verify because I could not locate a list of the included studies. However, to my best knowledge, most studies reporting brood size manipulation effects on parental survival had this as their main focus, in contrast to your statement.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal, although supplied by us on several occasions. We regret that the reviewer was impeded by this unfortunate communication failure, but we did our best to make the data available to the reviewers during the initial review process.

      (52) L.361-362: this seems a realistic approach from an evolutionary perspective, but we know from the jackdaw study by Boonekamp that the survival effect of brood size manipulation in a single year is very different from the survival effect of manipulating as in your model, i.e. every year of an individual's life the same manipulation. For very short-lived species this possibly does not make much difference, but for somewhat longer-lived species this could perhaps strongly affect your results. This should be discussed, and perhaps also explored in your simulations?

      Note that the Boonekamp study does not separate whether the survival effects are additive or

      multiplicative. As such, we do not know whether the survival effects for a single year manipulation are just small and hard to detect, or whether the survival effects are multiplicative. Our simulations assumed that the brood enlargement occurred every year throughout their lives. We have added some text to the discussion on the point you raise.

      (53) L.360: what is "lifetime reproductive fitness"? Is this different from just "fitness"?

      We have changed “lifetime reproductive fitness” to “lifetime reproductive output”.

      (54) L.363: when you are interested in optimal clutch size, why not also explore effects of reducing clutch size?

      As we find that a reduction in clutch size leads to a reduction in survival (for experimental studies), we already know that these individuals would have a reduced fitness return compared to reproducing at their normal level, and so we would not learn anything from adding this into our simulations. The interest in using clutch size enlargements is to find out why an individual does not produce more offspring than it does, and the answer is that it would not have a fitness benefit (unless its clutch size and survival rate combination is out of the bounds of that observable in the wild).

      (55) Fig.1 - using 'parental effort' in the y-axis label is misleading, suggest to replace with e.g. "clutch or brood size". Using "clutch size" in the title is another issue, as the experimental studies typically changed the number of young rather than the number of eggs.

      We have updated the figure axes to say “clutch size” rather than “parental effort”. Please see response to comment 35 where we explain our use of the term “clutch size” throughout this manuscript.

      (56) L.93 - 108: I appreciate the analysis in Table 1, in particular the fact that you present different ways of expressing the manipulation. However, in addition, I would like to see the results of an analysis treating the manipulations as factor, i.e. without considering the scale of the manipulation. This serves two purposes. Firstly, I believe it is in the interest of the field that you include a detailed comparison with the results of Santos & Nakagawa's analysis of what I expect to be largely the same data (manipulation studies only - for this purpose I would also like to see a comparison of effect size between the sexes). Secondly, there are (at least) two levels of meta-analysis, namely quantifying an overall effect size, and testing variables that potentially explain variation in effect size. You are here sort of combining the two levels of analysis, but including the first level also would give much more insight in the data set.

      Our main intention here was to improve on how the same hypothesis was approached by Santos & Nakagawa. We did this by improving our analysis (on a by “egg” basis) and by adding additional studies (i.e. more data). In this process mistakes are corrected (as we re-extracted all data, and did not copy anything across from their dataset – which was used simply to ensure we found the same papers); more recent data were also added, including studies missed by Santos & Nakagawa. This means that the comparison with Santos & Nakagawa becomes somewhat irrelevant, apart from maybe technical reasons, i.e. pointing out mistakes or limitations in certain approaches. We would not be able to pinpoint these problems clearly without considering the whole dataset, yet Santos & Nakagawa only had a small subset of the data that were available to us. In short, meta-analysis is an iterative process and similar questions are inevitably analysed multiple times and updated. This follows basic meta-analytic concepts and Cochrane principles. Except where there is a huge flaw in a prior dataset or approach (like we sometimes found and highlighted in our own work, e.g. Simons, Koch, Verhulst 2013, Aging Cell), in itself a comparison of the kind the reviewer suggests distracts from the biology. With the dataset being made available others can make these comparisons, if required. On the sex difference, we provide a comparison of effect sizes separated between both sexes and mixed sex in Table S2 and Figure S1.

      (57) L.93 - 108: a thing that does not become clear from this section is whether experimentally reducing brood size affects parental survival similarly (in absolute terms) as enlarging brood size. Whether these effects are symmetric is biologically important, for example because of its effect on clutch size optimization. In the text you are specific about the effects of increasing brood size, but the effect you find could in theory be due entirely to brood size reduction.

      We have added detail to make it clear that a brood reduction is simply the opposite trend. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori.

      We have added some discussion on this to our manuscript (L278-282), in response to an earlier comment.

      (58) L.103-107: this is perhaps better deferred to the discussion, because other potential explanations should also be considered. For example, there have been studies suggesting that small birds were provisioning their brood full time already, and hence had no scope to increase provisioning effort when brood size was experimentally increased.

      We agree this is a discussion point but we believe it also provides an important context for why we ran our simulations, and so we believe this is best kept brief but in place. We agree the example you give is relevant but believe this argument is already contained in this section. See line 121-123 “...suggesting that costs to survival were only observed when a species was pushed beyond its natural limits”.

      (59) L.103-107: this discussion sort of assumes that the results in Table 1 differ between the different ways that the clutch/brood size variation is expressed. Is there any statistical support for this assumption?

      We are unsure of what the reviewer means here exactly. Note that in each of the clutch size transformations, experimental and observational effect sizes are significantly opposite. For the proportional clutch size transformation, experimental and observation studies are both separately significantly different from 0.

      (60) L.104: at this point, I would like to have better insight into the data set. Specifically, a scatter plot showing the manipulation magnitude (raw) plotted against control brood size would be useful.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal.

      Thank you for this suggestion: this is a useful suggestion also to illustrate how manipulations are relatively stronger for species with smaller clutches, in line with our interpretation of the result presented in Figure 2. We have added Figure S1 which shows the strength of manipulation compared to the species average.

      (61) L. 107: this seems a bold statement - surely you can test directly whether effect size becomes disproportionally stronger when manipulations are outside the natural range, for example by including this characterization as a factor in the models in Table 1.

      It is hard to define exactly what the natural range is here, so it is not easy to factorise objectively, which is why we chose not to do this. However, it is clear that for species with small clutches the manipulation itself is often outside the natural range. Thank you for your suggestion to include a figure for this as it is clear manipulations are stronger in species with smaller clutches. We attribute this to species being forced outside their natural range. We consider our wording makes it clear that this is our interpretation of our findings and we therefore do not think this is a bold statement, especially as it fits with how we interpret our later simulations.

      (62) Fig.3, legend: the term 'node support' does not mean much to me, please explain.

      Node support is a value given in phylogenetic trees to dictate the confidence of a branch. In this case, values are given as a percentage and so can translate to how many times out of 100 the estimate of the phylogeny gives the same branching. Our values are low, as we have relatively few species in our meta-analysis.

      (63) Fig.3: it would be informative when you indicate in this figure whether the species contributed to the experimental or the observational data set or both.

      We have added into Fig 3 whether the species was observational, experimental or both.

      (64) L.139: the p-value refers to the interaction between species clutch size and treatment (observational vs. experimental), but it appears that no evidence is presented for the correlation being significant in either observational or experimental studies.

      We agree that our reporting of the effect size could be misinterpreted and have added detail here. The statistic provided describes the slopes are significantly different between observational and experimental, implying there are differences between the slopes of small and large clutch-laying species.

      (65) L.140: I am wondering to what extent these correlations, which are potentially interesting, are driven by the fact that species average clutch size was also used when expressing the manipulation effect. In other words, to what extent is the estimate on the Y-axis independent from the clutch size on the X-axis? Showing that the result is the same when using survival effect sizes per manipulation category would considerably improve confidence in this finding.

      We are unsure what the reviewer means by “per manipulation category”. Please also note that we have used a logistic regression to calculate our effect sizes of survival, given a unit increase in reproductive effort. So, for example, if a population contained birds that lay 2,3 or 4 eggs, provided that the number of birds which survived and died in each category did not change, if we changed the number of eggs raised to 10,11 or 12, respectively, then our effect size would be the same. In this way, our effect sizes are independent of the species’ average clutch size.

      (66) L.145: when I remember correctly, Santos & Nakagawa considered brood size reduction and enlargement separately. Can this explain the contrasting result? Please discuss.

      You are correct, in that Santos & Nakagawa compared reductions and enlargements to controls separately. However, we found some mistakes in the data extracted by Santos & Nakagawa that we believe explain the differences in our results for sex-specific effect sizes. We do not feel that highlighting these mistakes in the main text is fair, useful or scientifically relevant, as our approach is to improve the test of the hypothesis.

      (67) L.158-159: looking at table S2 it seems to me you have a whole range of estimates. In any case, there is something to be said for taking the estimates for females because it is my impression (and experience) that clutch size variation in most species is a sex-linked trait, in that clutch size tends to be repeatable among females but not among males.

      We agree that, in many cases, the female is the one that ultimately decides on the number of chicks produced. We did also consider using female effect sizes only, however, we decided against this for the following reasons: (1) many of the species used in our meta-analysis exhibit biparental care, as is the case for many seabirds, and so using females only would bias our results towards species with lower male investment; in our case this would bias the results towards passerine species. (2) it has also been shown that, as females in some species are operating at their maximum of parental care investment, it is the males who are able to adjust their workload to care for extra offspring. (3) we are ultimately looking at how many offspring the breeding adults should produce, given the effort it costs to raise them, and so even if the female chooses a clutch size completely independently of the male, it is still the effort of both parents combined that determines whether the parents gain an overall fitness benefit from laying extra eggs. (4) some studies did not clearly specify male or female parental survival and we would not want to reduce our dataset further.

      (68) L.158-168: please explain how you incorporated brood size effects on the fitness prospects of offspring, given that it is a very robust finding of brood size manipulation studies that this affects offspring growth and survival.

      We would argue this is near-on impossible to incorporate into our simulations. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. It would be interesting, however, to explore this further using estimates from the literature, but this is beyond our current scope, and would in our initial intuition not be very accurate. It would be interesting to explore how big the effect on offspring should be to constrain effect size strongly. Such work would be more theoretical. The point of our simple fitness projections here is to aid interpretation of the quantitative effect size we estimated.

      (69) L.163: while I can understand that you select the estimate of -0.05 for computational reasons, it has enormous confidence intervals that also include zero. This seems problematic to me. However, in the simulations, you also examined the results of selecting -0.15, which is close to the lower end of the 95% C.I., which seems worth mentioning here already.

      Thank you for this suggestion. Yes, indeed, our range was chosen based on the CI, and we have now made this explicit in the manuscript.

      (70) L.210: defined in this way, in my world this is not what is generally taken to be a selection differential. Is what you show not simply scaled lifetime reproductive success?

      As far as we are aware, a selection differential is the relative change between a given group and the population mean, which is what we have done here. We appreciate this is a slightly unusual context in which to place this, but it is more logical to consider the individuals who produce more offspring as carrying a potential mutation for higher productivity. However, we believe that “selection differential” is the best terminology for the statistic we present. We also detail in our methodology how we calculate this. We have adjusted this sentence to be more explicit about what we mean by selection differential.

      (71) L.177-180: is this not so because these parameter values are closest to the data you based your estimates on, which yielded a low estimate and hence you see that here also?

      We are unsure of what exactly the reviewer means here. The effect sizes for our exemplar species were predicted from each combination of clutch size and survival rate. Note that we used a range of effect sizes, higher than that estimated in our meta-analysis, to explore a large parameter space and that these same conclusions still hold.

      (72) L.191-194: these statements are problematic, because based on the assumption that an increase in brood size does not impact the fitness prospects of the offspring, and we know this assumption to be false.

      Though we appreciate that some cost is often absorbed by the offspring themselves, we are unaware of any evidence that these costs are substantial and large enough to drive within-species variation in reproductive effort, though for some specific species this may be the case. However, in terms of explaining a generalisable, across-species trend, the fitness costs incurred by a reduction in offspring quality are unlikely to be significantly larger than the survival costs to reproduce. We also find it highly unlikely the cost to fitness incurred by a reduction in offspring quality is large enough to counter-balance the effect of parental quality that we find in our observational studies. We do also discuss other costs in our discussion.

      (73) L.205: here and in other places it would be useful to be more explicit on whether in your discussion you are referring to observational or experimental variation.

      We have added this detail to our manuscript. Do note that many of our conclusions are drawn by the combination of results of experimental and observational studies. We believe the addition of Figure 5 makes this more clear to the reader.

      (74) L.225: this may be true (at least, when we overlook the misuse of the word 'quality' here), but I would expect some nuance here to reflect that there is no surprise at all in this result as this pattern is generally recognized in the literature and has been the (empirical) basis for the often-repeated explanation of why experiments are required to demonstrate trade-offs. On a more quantitative level, it is worth mentioning the paper of Vedder & Bouwhuis (2017, Oikos) that essentially shows the same thing, i.e. a positive association between reproductive output and parental survival.

      We have added some discussion on this point, including adding the citation mentioned. However, we would like to highlight that our results demonstrate that brood manipulations are not necessarily a good test of trade-offs, as they fail to recognise that individuals differ in their underlying quality. Though we agree that this result should not necessarily be a surprising one, we have also not found it to be the case that differences in individual quality are accepted as the reason that intra-specific clutch size is maintained – in fact, we find that it is most commonly argued that when costs of reproduction are not identifiedit is concluded that the costs must be elsewhere – yet we cannot find conclusive evidence that the costs of reproduction (wherever they lie) are driving intra-specific variation in reproductive effort. Furthermore, some studies in our dataset have reported negative correlations between reproductive effort and survival (see observational studies, Figure 1).

      (75) L.225-226: perhaps present this definition when you first use the term.

      We have added more detail to where we first use and define this term to improve clarity (L57-58).

      (76) L.227-228, "currently unknown": this statement surprised me, given that there is a plethora of studies showing within-population variation in clutch size to depend on environmental conditions, in particular the rate at which food can be gathered.

      We mean to question that if an individual is “high quality”, why is it not selected for? We have rephrased, to improve clarity.

      (77) L.231: this seems no more than a special case of the environmental effect you mention above.

      We think this is a relevant special case, as it constitutes within-individual variation in reproduction that is mistaken for between-individual variation. This is a common problem in our field, that we feel needs adressing. We only have between-individual variation here in our study on quality, and by highlighting this we show that there might not be any variation between individuals, but this could come about fully (doubtful) or partly (perhaps likely) due to terminal effects.

      (78) L235-236: but apparently depending on how experimental and natural variation was expressed? Please specify here.

      We are not sure what results the reviewer is referring to here, as we found the same effect (smaller clutch laying species are more severely affected by a change in clutch size) for both clutch size expressed as raw clutch size and standardised clutch size.

      (79) L.237: the concept of 'limits' is not very productive here, and it conflicts with the optimality approach you apply elsewhere. What you are saying here can also be interpreted as there being a non-linear relationship between brood size manipulation and parental survival, but you do not actually test for that. A way to do this would be to treat brood size reduction and enlargement separately. Trade-off curves are not generally expected to be linear, so this would also make more sense biologically than your current approach.

      We have replaced “limits” with “optima”. We believe our current approach of treating clutch size as a continuous variable, regardless of manipulation direction, is the best approach, as it allows us to directly compare with observational studies and between species that use different manipulations (now nicely illustrated by the reviewer’s suggested Figure S1). Also note that transforming clutch size to a proportion of the mean allows us to account for the severity in change in clutch size. We also do not believe that treating reductions and enlargements separately accounts for non-linearity, as either we are separating this into two linear relationships (one for enlargements and one for reductions) or we compare all enlargements/reductions to the control, as in Santos & Nakagawa 2012, which does not take into account the severity of the increase, which we would argue is worse for accounting for non-linearity. Furthermore, in the cases where the manipulation involved one offspring only, we also cannot account for non-linearity.

      (80) L.239: assuming birds are on average able to optimize their clutch size, one could argue that any manipulation, large or small, on average forces birds to raise a number of offspring that deviates from their natural optimum. At this point, it would be interesting to discuss in some detail studies with manipulation designs that included different levels of brood size reduction/enlargement.

      We agree with the reviewer that any manipulation is changing an individual’sclutch size away from its own individual optima, which we have argued also means brood manipulations are not necessarily a good test of whether a trade-off occurs in the wild (naturally), as there could be interactions with quality – we have now edited to explicitly state this (L299-300).

      (81) L.242-244: when you choose to maintain this statement, please add something along the lines of "assuming there is no trade-off between number and quality of offspring".

      As explained above, though we agree that the offspring may incur some of the cost themselves, we are not aware of any evidence suggesting this trade-off is also large enough to drive intra-specific variation in clutch size across species. Furthermore, in the context here, the trade-off between number and quality of offspring would not change our conclusion – that the fitness benefit of raising more offspring is offset by the cost on survival. We have added detail on the costs incurred by offspring earlier in our discussion (L309-315). The addition of Figure 5 should help interpret these data.

      (82) L.253: instead of reference 30 the paper by Tinbergen et al in Behaviour (1990) seems more appropriate.

      We believe our current citation is relevant here but we have also added the Tinbergen et al (1990) citation.

      (83) L.253-254: such trade-offs may perfectly explain variation in reproductive effort within species if we were able to estimate cost-benefit relations for individuals. In fact, reference 29 goes some way to achieve this, by explaining seasonal variation in reproductive effort.

      We are unaware of any quantitative evidence that any combination of trade-offs explains intra-specific variation in reproductive effort, especially as a general across-species trend.

      (84) L.255: how does one demonstrate "between species life-history trade-offs"? The 'trade-off' between reproductive rate and survival we observe between species is not necessarily causal, and hence may not really be a trade-off but due to other factors - demonstrating causality requires some form of experimental manipulation.

      Between-species trade-offs are well established in the field, stemming from GC Williams’ seminal paper in 1966, and for example in r/K selection theory. It is possible to move from these correlations to testing for causation, and this is happening currently by introducing transgenes (genes from other species) that promote longevity into shorter-lived species (e.g., naked-mole rat genes into mice). As yet it is unclear what the effects on reproduction are.

      (85) L.256: it is quite a big claim that this is a novel suggestion. In fact, it is a general finding in evolutionary theory that fitness landscapes tend to be rather flat at equilibrium.

      It is important to note here that we simulate the effect size found, and hence this is the novel suggestion, that because the resulting fitness landscape is relatively flat there is no directional selection observed. We did not intend to suggest our interpretation of flat fitness landscapes is novel. We have changed the phrasing of this sentence to avoid misinterpretation.

      (86) L.259: why bring up physiological 'costs' here, given that you focus on fitness costs? Do you perhaps mean fitness costs instead of physiological costs? Furthermore, here and in the remainder of this paragraph it would be useful to be more specific on whether you are considering natural or experimental variation.

      The cost of survival is a physiological cost incurred by the reduction of self-maintenance as a result of lower resource allocation. This is one arm of fitness; we feel it would be confusing here to talk about costs to fitness, as we do not assess costs to future reproduction (which formed the large part of the critique offered by the reviewer). We would like to highlight that the aim of this manuscript was to separate costs of reproduction from the effects of quality, and this is why we have observational and experimental studies in one analysis, rather than separately. Our conclusion that we have found no evidence that the survival cost to reproduce drives within-species variation in clutch size comes both from the positive correlation found in the observational studies and our negligible fitness return estimates in our simulations. We therefore, do not believe it is helpful to separate observational and experimental conclusions throughout our manuscript, as the point is that they are inherently linked. We hope that with the addition of Figure 5 that this is more clear.

      (87) L.262: The finding that naturally more productive individuals tend to also survive better one could say is by definition explained by variation in 'quality', how else would you define quality?

      We agree, and hence we believe quality is a good term to describe individuals who perform highly in two different traits. Note that we also say the lack of evidence that trade-offs drive intra-specific variation in clutch size also potentially suggests an alternative theory, including intra-specific variation driven by differences in individual quality.

      Supplementary information

      (88) Table S1: please provide details on how the treatment was coded - this information is needed to derive the estimates of the clutch size effect for the treatments separately.

      We have added this detail.

      (89) Table S2: please report the number of effect sizes included in each of these models.

      We have added this detail.

      (90) Table S4: references are not given. Mentioning species here would be useful. For example, Ashcroft (1979) studied puffins, which lay a single egg, making me wonder what is meant when mentioning "No clutch or brood size given" as the reason for exclusion. A few more words to explain why specific studies were excluded would be useful. For example, what does "Clutch size groups too large" mean? It surprises me that studies are excluded because "No standard deviation reported for survival" - as the exact distribution is known when sample size and proportion of survivors is known.

      We have updated this table for more clarity.

      (91) Fig.S1: please plot different panels with the same scale (separately for observational and experimental studies). You could add the individual data points to these plots - or at least indicate the sample size for the different categories (female, male, mixed).

      We have scaled all panels to have the same y axis and added sample sizes to the figure legend.

      (92) Fig.S3: please provide separate plots for experimental and observational studies, as it seems entirely plausible that the risk of publication bias is larger for observational studies - in particular those that did not also include a brood size manipulation. At the same time, one can wonder what a potential publication bias among observational studies would represent, given that apparently you did not attempt to collect all studies that reported the relevant information.

      We have coloured the points for experimental and observational studies. Note that a study is an independent effect size and, therefore, does not indicate whether multiple data (i.e., both experimental and observational studies) came from the same paper. As we detail in the paper and above in our reviewer responses, we searched for observational studies from species used in the experimental studies to allow direct comparison between observational and experimental datasets.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend improving the theoretical component of the analysis by providing a solid theoretical framework before, from it, drawing conclusions.

      This, at a minimum, requires a statistical model and most importantly a mechanistic model describing the assumed relationships.

      We thank the reviewer for highlighting that our aims and methodology are unclear in places. We have added detail to our model and simulation descriptions and have improved the description of our rationale. We also feel the failure of the journal to provide code and data to the reviewers has not helped their appreciation of our methodology and use of data.

      Because the field uses the same wording for different concepts and different wording for the same concept, a glossary is also necessary.

      We thank the reviewer for raising this issue. During the revision of this manuscript, we have simplified our terminology or given a definition, and we believe this is sufficient for readers to understand our terminology.

      Reviewer #3 (Recommendations For The Authors):

      • The files containing information of data extracted from each study were not available so it has not been possible to check how any of the points raised above apply to the species included in the study. The ms should include this file on the Supp. Info as is standard good practice for a comparative analysis.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data is too large to include as a table in the main text and is not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      • For clarity, refer to 'the effect size of clutch size on survival" rather than simply "effect size". Figures 1 and 2 require cross-referencing with the main text to understand the y-axis.

      We have added detail to the figure legend to increase the interpretability of the figures.

      • Silhouettes in Figure 3 (or photos) would help readers without ornithological expertise to understand the taxonomic range of the species included in the analyses.

      We have added silhouettes into Figure 3.

      • Throughout the discussion: superscripts shouldn't be treated as words in a sentence so please add authors' names where appropriate.

      We have added author names and dates where required.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, and the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid with well-designed tests to optimize the protocol under different conditions.

      Public Reviews:

      We thank both reviewers for suggestions, feedback and improvements. We address these pointwise below.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      Strengths:

      The two steps, the oxidation, and the splint-assisted ligation are yield-diminishing steps, thus the protocol of Davidsen and Sullivan is an important improvement of the current protocols to enhance the quantification of aminocyl-tRNAs.

      Weaknesses:

      The oxidation and the selection of aminoacyl-tRNA is the first step in all protocols. Thereafter they differ on whether blunt ligation, hairpin (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim tRNA-seq, LOTTE tRNA-seq), or splint ligation is used and finally what detection method is applied (i-tRAP, tRNA microarrays). What is the correlation to those alternative approaches (e.g. i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264) etc.)? What is the correlation with other approaches with which this improved protocol shares some steps (DM-tRNA-seq, mim-tRNA-seq)?

      We appreciate the fair assessment and fully agree that our work would benefit from a large comparison between all known tRNA-seq methods. We did directly compare many elements of our method to those of other methods (e.g. ligation efficiency and barcode bias); however, as noted by the reviewer we did not perform a direct end-to-end comparison with all other methods. An ideal comparison would require running several different sample conditions and technical replicates through our protocol and repeating the process across a half dozen or so other methods as they are described. Unfortunately, this approach is unlikely to be feasible since each method uses different oligos, reagents and kits, and all would have to be acquired at substantial cost. Some methods also rely on other detection methods such as microarrays, qPCR, or Illumina sequencing, which would also make this goal all the more onerous. There are also different pipelines for data processing that, in some instances, make the final results hard to compare. In short, this would be a monumental and expensive task to do comprehensively. We also worry that, even if these experiments were conducted such that some variables were concluded to be superior, they could still be challengeable based on perceived or actual protocol differences from the prior art. In summary, we think that an overall comparison with each method would be ideal, but practical concerns limit us to optimizing and comparing the variables that we found to be most prone to introducing bias in the results.

      For methods that measure tRNA expression levels (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim-tRNA-seq, LOTTE tRNA-seq etc.) there are some fundamental problems regarding absolute quantification using NGS that preclude simple comparisons. These problems are well known in the field of microRNA (Fuchs et al. (2012) [PMID: 25942392]) and arise due to several factors introduced during processing steps such as purification, ligation, reverse transcription and amplification. With the lack a “true” quantitation benchmark it would be difficult to make quantitative claims from each.  Therefore, in our own work we benchmark tRNA expression levels for sample-to-sample reproducibility (i.e. precision) as further explained in the response to reviewer #2.

      For comparison to methods that measure tRNA charge we did have an opportunity to compare our results with those of another study. To this end, we have added a figure comparing the baseline charge found using our method and the one used in Evans et al. (Revised manuscript Figure 2—figure supplement 9). This comparison finds broadly similar results for tRNA charge, including similar trends for a subset of Glu, Ser and Pro codons that are notable for their lowered basal tRNA charge.

      Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, there are a few points that need to be clarified.

      We appreciate the acknowledgement of the strength of our aminoacylation controls and agree that our method is relying on many aspects of the mentioned prior work.  

      (1) One point that remains unsatisfactory is a better benchmarking against the state of the art. It is currently impossible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those used in the mim-tRNAseq study and not with H1299 cells.

      We fully agree that more rigorous benchmarking would be desirable. As also noted in the response to reviewer #1, a full end-to-end comparison of methods would be ideal but would be onerous and expensive in practice, so we focused on optimizing the steps we found to be most prone to introducing bias in the data.

      We agree that Behrens et al., (2021) has substantial methodological overlap with our work and was instrumental in our efforts; however, the focus of their manuscript was largely on quantification of tRNA abundance and modifications, rather than the tRNA charge. In fact, tRNA charge was only determined for yeast in that study. Quantifying the abundance of short RNAs using NGS is very difficult (Fuchs et al. (2012) [PMID: 25942392]) and will likely require the use of a mixture of tRNAs as spike-in references for normalization (Bissels et al. (2009) [PMID: 19861428]). In the case of Behrens et al. (2021), they did not use a spike-in tRNA reference, but instead correlated gene copy number with their measured tRNA abundance. They also compare to Northern blotting for two tRNA transcripts, showing a directionally similar result; however, no quantitative claims can be made measurement accuracy. Until a good method of normalizing tRNA quantification is found, we believe that sample-to-sample reproducibility (i.e. precision) is the most useful objective to optimize because this will allow detection of differential expression. Towards that end, we quantified the precision of our method (Figure 4 and its two supplementary figures) with associated statistics, which can be used to estimate the number of samples required to detect significance during differential expression analysis. For tRNA charge, quantification is easier, which is why we present statistics on both accuracy and precision. In this case we can better compare results across methods, and so we have added a comparison of our results to the charge quantification from Evans et al. (2017) (Figure 2—figure supplement 9).

      (2) While the protocol aims to implement an improved method for quantification of tRNA aminoacylation, it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods. Are there any possible adaptations that would allow the analysis of tRNA fragments?

      The first part of this comment regarding comparison of methods is addressed in response to in the prior reviewer comment and in the response to reviewer 1. In the specific case of tRNA modifications, the issue is similar to abundance quantification in that a “true” reference of modified tRNA is likely necessary for proper quantification, alongside testing of each method simultaneously.

      Regarding tRNA fragments, our method is not suitable for this use case. This is because our adapter ligation step depends on an intact tRNA structure with either CCA or CC overhang on the 3’-end and thus we almost exclusively get reads with CCA/CC ends and no reads from fragments. This specificity is good for increasing charge quantification accuracy but not good for the methods versatility. For a more versatile method we recommend Watkins et al. (2022) [PMID: 35513407].

      (3) Like Behrens et al. (2021), Davidsen and Sullivan use TGIRT-III RT for their analyses. The enzyme is not currently available in a form suitable for tRNA-seq. It would be very helpful to test different new RT enzymes that are commercially available. The example of Maxima RT - Figure 2 Supp 6 - shows significantly lower performance than the presented TGIRT-III RT data. In lines 296-298, the authors mention improvements to the protocol by using ornithine. Why are these improvements not included?

      We share similar concerns that the TGIRT-III enzyme is no longer commercially available. It became unavailable while we were preparing this manuscript, reflected by the fact that almost all our figures are made using this enzyme. Others have discovered this too and Lucas et al. (2023) [PMID: 37024678] tested several RT polymerases using TapeStation as a readout for readthrough. As they reported that Maxima has good performance, we decided to test it on a full run with replicates. The results are outlined in Figure 2—figure supplement 6 and for resubmission we have added a table to the appendix that compares the alignment statistics. Unfortunately, the readthrough of the Maxima polymerase on cytoplasmic tRNAs is not as high as for TGIRT-III; however, interestingly it seems to have better performance for mitochondrial tRNAs (Figure 2 – Figure Supplement 6). Regardless, in the initial paper submission we failed to evaluate whether this readthrough difference affected charge measurements. We have now fixed this by adding Figure 2—figure supplement 7, which shows that there are no differences in charge measurements TGIRT-III vs. Maxima. Not surprisingly, there are substantial differences between polymerases when looking at relative tRNA abundance (which affirms the discussion above related to the difficulty of tRNA abundance quantification); however, the high sample-to-sample reproducibility remains intact with either polymerase. An exhaustive search for better polymerases is warranted but falls outside the scope of our work.

      Regarding the improvements suggested by us, using ornithine as a cleavage catalyst instead of lysine, we first learned about this possibility later and thus only want to make readers aware that other options exist. We have clarified the paragraph to make this clearer.

      (4) A technical concern: The samples are purified multiple times using a specific RNA purification kit. Did the authors test different methods to purify the RNA and does this influence the result of the method?

      In the past, we have relied exclusively on alcohol precipitation but during the development of this protocol we found it easier and more reproducible to use column-based purification when possible. However, as we have not made a direct comparison this remains anecdotal evidence. Nonetheless, to minimize any possible bias of column-based purification you will notice that we use columns with binding capacity 5x higher than the highest amount of RNA/DNA added to the column.

      (5) The study would benefit from an explicit step-by-step protocol, including the choice of adapters that are shown to work best in the protocol.

      This is a great point! We have included tables with all the oligos used (Supplementary file 1), a detailed step-by-step protocol with pictures of anticipated gel results (Supplementary file 2) and an overview of the RNA/DNA manipulations to make it clear where adapter sequences are located (Supplementary file 3). For the data processing we provide a comprehensive example in the Github repository. All this was included in our first submission of this manuscript (as well as on bioRxiv), but we suspect this was not readily accessible to the reviewers. We will make sure that these documents are going to be available through eLife and have emphasized their existence in the main text of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To stratify this improvement a comparison to the most common methods should be made. For example, how do the results with the improved protocol with i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264), or with the approaches the improved protocol shares with some other tRNA-seq approaches (DM-tRNA-seq, mim-tRNA-seq)?

      Once again, we thank the reviewer for the good recommendations. The points about direct comparisons were discussed above.

      Reviewer #2 (Recommendations For The Authors):

      These are all great points; we address them below.

      Minor points:

      - Please use chemical conventions, e.g. for mcm5s2U and NaIO4 with superscript or subscript.

      Fixed.

      - Figure 2F: Glu GAA is only 82% charged; can this be due to mcm5s2U (Figure 3 supp 2) leading to a misalignment? What happens to Ser-NNN? Why is mitochondrial tRNA so much less charged?

      Regarding the Glu-GAA charge at baseline, we do not think this is an artifact of the mcm5s2U modification as it would then also be expected for Gln-CAA and Lys-AAA. The same occurs in the charge data in Evans et al. (2017) and they use a very different alignment strategy. Lastly, the charge titration and half-life experiments show no evidence of inaccuracy/bias for Glu-GAA.

      But the question remains – why is the charge of Glu-GAA so low? At this point our best guess is speculative. It may have something to do with the strong enrichment of Glu-GAA codons in the A site found by ribosome profiling on mouse embryonic stem cells (Ingolia et al. (2011) [PMID: 22056041]).

      - Spell out "clvg" or "dphs" in the figure legend of Figure 2 and others. Similar for other abbreviations in figures. They are not always explained in the legends.

      Fixed.

      - Figure 3 supp 2: Please use U instead of T in the anticodons. The labels are a bit confusing. Please clearly align to the tick (also for Figure 3C).

      Fixed.

      - Line 220-223. Which RT enzyme was used for Figure 3 supp 2? Does it make a difference?

      TGIRT-III was used. Only Figure 2—figure supplement 6 and Figure 2—figure supplement 7 (added for resubmission) show data with the Maxima polymerase. To address the second part of the question we have added a comparison between TGIRT-III and Maxima for mcm5s2U modification detection (Figure 3—figure supplement 3). Interestingly, there is a polymerase specific signature for mcm5s2U modifications; however, more work would be required to determine which polymerase is best suited for detection of this and other modifications.

      - Figure 4 supp 1 and Figure 4 supp 2 change order.

      Fixed.

      Typos:

      - Figure 1 and Figure 1-figure supplement 1: In the periodate the "-" is in a small box (at least in my PDF viewer). Can this box be removed?

      - Line 175: duplicated verb.

      - Line 348: "moved".

      Thanks for catching these. They have now been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Measurement of secreted amylase could be seen as direct evidence of sweating, however, how to determine the causal relationship between climbing behavior and sweating? Friction force may also be reduced when there is too much fingertip moisture.

      As the reviewer notes, measurement of secreted amylase can provide direct evidence of sweating, and we performed an iodine and starch reaction. Upon observing the involvement of TRPV4 in mouse foot pad perspiration, we then considered which type of behavioral analysis would be suitable to evaluate this perspiration. We agree with the reviewer’s point that friction force in the climbing test may be reduced by excessive sweating. However, we did not observe severe sweating in the absence of acetylcholine treatment. Accordingly, we interpreted that the increase in the climbing test failure rate for TRPV4KO mice could reflect the reduced friction force associated with the lack of TRPV4 activity.

      (2) For the human skin immunostaining, did the author use the same TRPV4 antibody as used in the mouse staining? Did they validate the specificity of the antibody for the human TRPV4 channel? 

      We used different antibodies for human and mouse samples. Since commercially available anti-TRPV4 antibodies do not work well with mouse samples, we generated our own anti-TRPV4 antibody and validated its specificity.

      (3) In lines 116-117, the authors tried to determine "the functional interaction of TRPV4 and ANO1 is involved in temperature-dependent sweating", however, they only used the TRPV4 ko mice and did not show any evidence supporting the relationship between TRPV4 and ANO1. 

      As the reviewer pointed out, based on the data presented in the original submission we cannot conclude that an interaction between TRPV4 and ANO1 is involved in perspiration. However, we think that the data for TRPV4KO mice presented in Figure 3 of the original version does indicate that TRPV4 is involved in perspiration. The finding that menthol and its related compounds, which inhibit the function of both TRPV4 and ANO1 (see our publication in Scientific Reports 7: 43132, 2017), blocked perspiration in both wild-type and TRPV4KO mice (original Figure 3C, D) indicates involvement of either TRPV4 or ANO1 in perspiration. In the revised version, we present results for additional iodine and starch reaction experiments using Ani9, a potent and specific ANO1 inhibitor. Ani9 drastically inhibited perspiration from mouse food pads both at 25 °C and 35 °C. Based on these collective results, we concluded that both TRPV4 and ANO1, likely acting as a complex, are involved in perspiration. We present the new data with Ani9 in the revised Figure 3E, F.

      (4) Figure 3-4 is quite confusing. At 25˚C, no sweating difference was observed between TRPV4 and wt mice (Fig 3A-3D), suggesting both Ach-induced sweating and basal sweating are TRPV4-independent at 25˚C, however, the climbing test was done at 26-27 ˚C and the data showed a climbing deficit in TRPV4 ko mice. How to interpret the data is unclear. 

      Thank you for raising this point. In the iodine and starch reaction experiment, we observed no significant reduction in perspiration in the absence of acetylcholine at 25 °C, which is the same condition as in the climbing test, whereas we detected less perspiration for TRPV4KO mice. In a trial using additional mice, we detected significantly less perspiration under control conditions without acetylcholine at 25 °C, which is consistent with the results of the climbing test. We have added this new data to the revised Figure 3A, B.

      (5) Were there any gender differences associated with sweating in mice? In Figure 3, the mouse number for behavior tests should be at least 5. 

      The TRPV4KO mice reproduced poorly and we were unable to obtain sufficient numbers of male and female mice to determine whether there were gender differences in sweating. However, according to the reviewer’s suggestion, and as mentioned above, we increased the number of experiments to obtain the results shown in the revised Figure 3. We did not a observe a significant difference in sweating with the larger sample size, which supports our conclusions.

      (6) 8- to 21-week-old mice were used in the immunostaining, the time span is too long. 

      Given the difficulty in obtaining sufficient numbers of TRPV4KO mice, we used a somewhat wider age distribution to obtain samples for immunostaining. However, we did not observe age-dependent differences in immunostaining. We reference this point in the revised manuscript.

      (7) The authors used homozygous TRPV4 ko mice for all experiments. What are control mice? Are they littermates of the TRPV4 ko mice? 

      We did not use littermates for our in vivo experiments because the TRPV4KO mice reproduced poorly and the litter sizes were small. However, we did backcross the KO mice to the commercially available wild-type mice more than ten times. As such, we expect that the wild-type and TRPV4KO mice will have similar genetic backgrounds. In addition, we have published multiple studies that have successfully used this method, which we think supports the reliability of our results for experiments involving mice.

      Reviewer #2 (Public Review):

      (1) The coexpression data needs additional controls. In the TRPV4 KO mice, there appears to be staining with the TRPV4 Ab in TRPV4 KO mice below the epidermis. This pattern appears similar to that of the location of the secretory coils of the sweat glands (Fig 1A). Is the co-staining the authors note later in Figure 1 also seen in TRPV4 KOs? This control should be shown, since the KO staining is not convincing that the Ab doesn't have off-target binding. 

      We thank the reviewer for raising these concerns about immunostaining. As the reviewer notes, in the low power image the signals appeared to be weak and punctate signals were present in the basal region of glandular cells. Although we did not identify immunohistochemical conditions that produced no signal, tissue sections from WT mice stained with anti-TRPV4 antibody showed conspicuous apical signals for the glandular cells facing lumen. Meanwhile, TRPV4KO tissues showed no signals at the apical region of the glandular cells, where the TRPV4-ANO1 interaction is expected to occur. We confirmed no trace signals in the TRPV4KO tissues in the immunoblotting.

      (2) Are there any other markers besides CGRP for dark cells in mice to support the conclusion that mouse secretory cells have clear cell and dark cell properties? 

      We did not stain with other dark cell markers. Based on previous studies describing the differences between clear and dark cells in mouse eccrine glands, we think that dark and clear cells cannot be clearly discriminated, as we described in lines 93-96 of the Results. We identified secretory cells using CK8 and dark cells with CGRP, a marker of dark cells in human eccrine glands (Zancanaro et al. 1999 J Anat). Our result showed that CGRP immunostaining could not discriminate between clear and dark cells, which is consistent with a previous report showing that mouse secretory cells were assumed to be undifferentiated and primitive based on electron microscopic observation (Kurosumi et al. 1970 Arch Histol Jap).

      (3) The authors utilize menthol (as a cooling stimulus) in several experiments. In the discussion, they interpret the effect of menthol as potentially disrupting TRPV4-ANO1 interactions independent of TRPM8. Yet, the role of TRPM8, such as in TRPM8 KO mice, is not evaluated in this study.

      We performed the iodine and starch reaction experiments with TRPM8KO mice. In the TRPM8KO mice, the sweat spots did not differ from those seen for WT mice (p=0.63, t-test), and there was also a significant reduction in sweating with menthol treatment following acetylcholine stimulation that was similar to that seen for WT mice. These results would rule out the involvement of TRPM8 in a menthol-induced reduction in sweating. We have included this data in the revised Figure 3D.

      (4) Along those lines, the authors suggest that menthol inhibits eccrine function, which might lead to a cooling sensation. But isn't the cooling sensation of sweating from evaporative cooling? In which case, inhibiting eccrine function may actually impair cooling sensations.

      Menthol has a non-specific effect that activates TRPM8, TRPV3 and TRPA1, and inhibits TRPV1, TRPV4 and ANO1. Therefore, we did not carry out a climbing test with menthol in part because menthol-dependent TRPA1 activation decreased the propensity of the mice to climb. As the reviewer notes, TRPM8 activation following topical application of menthol may cause a cooling sensation elicited in sensory neurons beneath the skin. However, the comfortable cooling sensation could also be caused in part by decreased sweating. The relationship between a comfortable cooling sensation and less perspiration following menthol application may be difficult to determine, and we have mentioned this in the updated Discussion.

      (5) The climbing assay is interesting and compelling. The authors note performing this under certain temperature and humidity conditions. Presumably, there is an optimal level of skin moisture, where skin that is too dry has less traction, but skin that is too wet may also have less traction. It would bolster this section of the study to perform this assay under hot conditions (perhaps TRPV4 KO mice, with impaired perspiration, would outperform WT mice with too much sweating?), or with pharmacologic intervention using TRPV4 agonists or antagonists to more rigorously evaluate whether this model correlates to TRPV4 function in the setting of different levels of perspiration.

      We thank the reviewer for this suggestion. Upon detecting the involvement of TRPV4/ANO1 interaction in perspiration, we considered different behavioral analyses that can be performed to demonstrate whether the TRPV4/ANO1 interactions are involved in perspiration. As the reviewer suggested, there should be an optimal level of sweating. Therefore, we first set the room temperature at 26-27 ˚C and humidity at 35-50%. To our knowledge, this is the first demonstration of temperature-dependent sweating of mouse foot pads. In humans, palm sweating is often referred to as psychotic sweating that is known to be regulated by sympathetic nerve activity. Here we tested whether foot pad sweating might be related to friction force wherein sufficient amounts of sweating could increase the friction force and in turn increase the success rate for the climbing test using a vinyl-covered slippery slope that was selected based on several trials to determine the optimal surface material and slope angles. As the reviewer suggests, the success rates could be affected by multiple factors, and hot temperatures likely induce more sweating that could increase the success rates in the climbing test. We will need to carry out additional experiments that are beyond the scope of this study to examine these temperature-dependent effects. Generally, sweating is regulated by sympathetic nerve activity that occurs in response to increased brain neuron excitation. However, here we raise for the first time the possibility that sweating might be regulated by local temperature sensation mediated through TRPV4 that may be effective for fine-tuning of perspiration activity. We have updated the Discussion to reference this possibility.

      (6) There are other studies (PMID 33085914, PMID 31216445) that have examined the role of TRPV4 in regulating perspiration. The presence of TRPV4 in eccrine glands is not a novel finding. Moreover, these studies noted that TRPV4 was not critical in regulating sweating in human subjects. These prior studies are in contradiction to the mouse data and the correlation to human anhidrotic skin in the present study. Neither of these studies is cited or discussed by the authors, but they should be. 

      We thank the reviewer for referencing these other studies concerning the possible involvement of TRPV4 in perspiration in humans. These studies focused on the vasodilating effects of TRPV4 and drew the conclusion that TRPV4 is not involved in sweating in humans, which is in contrast to our data for mice and humans. Multiple factors could explain the apparent difference between the two studies. For example, the parameters they examined differed from ours in that we assessed patients with AIGA, whereas the previous studies involved healthy volunteers. We have updated the Discussion to note the difference in the results of our and previous studies.   

      Reviewer #3 (Public Review):

      (1) Figure 2: The calcium imaging-based approach shows average traces from 6 cells per genotype, but it was unclear if all acinar cells tested with this technique demonstrated TRPV4-mediated calcium influx, or if only a subset was presented.

      “n = 6” does not indicate the number of cells, but rather 6 independent experiments that each had over 20 ROIs of sweat glands. We have clarified this point in the updated figure legend.

      (2) Figure 4: The climbing behavioral test shows a significant reduction in climbing success rate in TRPV4-deficient mice. The authors ascribe this to a lack of hind paw 'traction' due to deficiencies in hind paw perspiration, but important controls and evidence that could rule out other potential confounds were not provided or cited. 

      As noted in our response to Comment 5 made by Reviewer #2, we spent considerable time identifying optimal conditions that would delineate success rates in the climbing experiments. We are confident that TRPV4KO mice had significantly lower success rates than WT mice, but there are various factors that could affect the experimental outcomes. We reference these factors in the updated Discussion.

      (3) In general, the results support the authors' claims that TRPV4 activity is a necessary component of sweat gland secretion, which may have important implications for controlling perspiration as well as secretion from other glands where TRPV4 may be expressed. 

      As described above, the results we obtained in the climbing test can be affected by various factors. However, based on the consistency of the results obtained for the climbing test and the iodine and starch reaction assay, we think that our interpretation is correct. In terms of the involvement of TRPV4/ANO1 interactions in fluid secretion, we previously reported that the TRPV4/ANO1 complex is involved in cerebrospinal fluid secretion in the mouse choroid plexus (FASEB J. 2014) and in saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 2018). Together, these findings suggest that this mechanism is common to water efflux from exocrine glands.

      Reviewer #1 (Recommendations For The Authors):

      (1) An exocrine gland-specific trpv4 knockout mouse should be used, as TRPV4 is also expressed by muscles, global knockout TRPV4 may affect the TRPV4-dependent muscle strength and reduce the climbing ability in mice. 

      As the reviewer suggests, use of mice with TRPV4 knockout specific to exocrine glands would be preferable to mice having global TRPV4 knockout given that TRPV4 is expressed in multiple tissues. We agree with this suggestion, but we do not currently have such mice in hand. However, as mentioned above, we have reported the involvement of theTRPV4/ANO1 interaction in cerebrospinal fluid secretion from the choroid plexus in mice (FASEB J. 28: 2238-2248, 2014), as well as saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 32: 1841-1854, 2018.), suggesting that the TRPV4/ANO1 interaction could be widely involved in exocrine gland functions that involve water movement. We have updated the Discussion to reference this point.  

      (2) The authors showed Calcium imaging data that Menthol inhibits TRPV4-dependent calcium influx. However, it is well known that menthol induces the sensation of cooling by activating TRPM8. More evidence, including patch clamp recordings, should be done to verify the inhibition effects of menthol on TRPV4 and ANO1. Moreover, Fig 3E-3F could only suggest that menthol-induced cooling sensation may affect sweating but not the inhibition effect of menthol on TRPV4 and ANO1 channels. 

      We agree that more evidence including patch-clamp recordings can verify the inhibitory effects of menthol on TRPV4 and ANO1. We did not include such experiments here since we previously showed that menthol and related agents indeed inhibit TRPV4- and ANO1-mediated currents (Sci. Rep. 7: 43132, 2017). We now cite this paper in the revised version.

      (3) Excepting the climbing test, are there any other better models to asses the sweating-related behaviors? 

      When we detected the involvement of TRPV4/ANO1 interactions in perspiration, we considered different types of behavioral analyses that could be used to demonstrate TRPV4/ANO1-dependent perspiration. We think that the climbing experiment is the best test, particularly since foot pads are one of the few regions on mice that is not covered by fur and thus amenable to evaluation of perspiration using an iodine and starch test.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was confused by a section in the introduction on lines 59-60: How does Cl- efflux lead to the formation of a physical complex in cells with high intracellular Cl-? What is the physical complex? This seems like several disparate concepts combined together, which need to be clarified.

      We apologize for the incomplete descriptions of several of our previous works. We have amended the Introduction section in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) TRPV4 is expressed by multiple other cell types in the skin (keratinocytes, macrophages etc.) which may have an impact on peripheral sensory function. Is there evidence that TRPV4-deficient animals have relatively normal sensory acuity and/or proprioception? Such evidence would lend more credibility to the reported findings in the climbing test. 

      As the reviewer points out, TRPV4 is expressed by multiple other cell types in the skin. To date we have found that TRPV4KO mice show no differences in sensory functions compared to WT mice. Whether TRPV4 is involved in proprioception is unclear, based on both our own observation and those that appear in the literature, although TRPV4 is clearly activated by mechanical stimuli. We previously compared the mechanical sensitivity of TRPV4 and Piezo1 in bladder epithelial cells, and found that Piezo 1 shows much higher sensitivity relative to TRPV4 (J. Biol. Chem. 289: 16565-16575, 2014), which is consistent with the involvement of Piezo1, rather than TRPV4, in proprioception. Although TRPV4 is reported to be expressed in sensory neurons, we did not detect TRPV4-mediated responses in isolated rat and mouse DRG neurons, suggesting that TRPV4-positive sensory neurons are relatively rare.

      (2) The methods section refers to loading entire sweat glands with Fura-2 dye for calcium imaging, but the figure legend refers to sweat gland acinar cells. Resolving this ambiguity would help readers to interpret the data. 

      We apologize for this error and have made an appropriate correction in the revised manuscript.

      (3) Alternatively, could acute intraplantar injection of a TRPV4 antagonist (e.g. GSK205) in wild-type mice phenocopy the TRPV4-knockout mouse deficits, or could normal climbing behavior be restored in the TRPV4 knockout by adding artificial perspiration to their hindpaws?

      We thank the reviewer for raising this interesting possibility and suggesting use of TRPV4 agonists or antagonists in the climbing tests. We agree that results of such an experiment would support the involvement of TRPV4 in sweating. We tried to do such experiments using injection of TRPV4 regulators into mouse hindpaws. However, the injections themselves appeared to impact climbing ability, perhaps in part due to painful sensations associated with the injection. Similarly, menthol injection appeared to reduce climbing activity, likely through pain sensations associated with TRPA1 activation. As such, we did not pursue these experiments.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and efforts of the Reviewer.

      In light of your data showing that the IgG response is similar with and without CIN, it would be good to drop "and induce abroad, vaccination-like anti-tumor IgG response". This suggests a direct connection between CIN and the IgG response.In my opinion, the shorter title is equally strong and more correct.

      We edited this phrase in the originally submitted title for accuracy:

      Chromosomal instability induced in cancer can enhance macrophage-initiated immune responses that include anti-tumor IgG

      I agree that inducing CIN through other means can be left for a different study but in that case the abstract should moredirectly mention MSP1 inhibition since that is how CIN is always induced. Perhaps line 18: CIN is induced by MSP-1inhibition in poorly immunogenic....

      Done as requested:

      “…Here, CIN is induced in poorly immunogenic B16F10 mouse melanoma cells using spindle assembly checkpoint MPS1 inhibitors…”


      The following is the authors’ response to the original reviews.

      eLife assessment

      This study highlights a valuable finding that chromosomal instability can change immunes responses, in particular macrophages behaviours. The convincing results showing that the use of CD47 targeting and anti-Tyrp1 IgG can overcome changes in immune landscape in tumors and prolong survival of tumor-bearing mice. These findings reveal a new exciting dimension on how chromosomal instability can influence immune responses against tumor.

      We thank the Editors for their enthusiasm and appreciation for this work. We also want to highlight our thanks for their careful reading, support, and patience while handling this manuscript. While this work provides useful insight into potential therapeutic implications of chromosomal instability in the macrophage immunotherapy field, we also hope it elucidates some novel basic science to further explore how chromosomal instability has such interesting effects on the immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hayes et al. explored the potential of combining chromosomal instability with macrophage phagocytosis to enhance tumor clearance of B16-F10 melanoma. However, the manuscript suffers from substandard experimental design, some contradictory conclusions, and a lack of viable therapeutic effects.

      The authors suggest that early-stage chromosomal instability (CIN) is a vulnerability for tumorigenesis, CD47-SIRPa interactions prevent effective phagocytosis, and opsonization combined with inhibition of the CD47-SIRPa axis can amplify tumor clearance. While these interactions are important, the experimental methodology used to address them is lacking.

      Reviewer #1 (Recommendations For The Authors):

      First, early stages of the tumor are essentially being defined as before implantation. In all cases, the tumor cells were pre-treated with MPS1i or had a genetic knockout of CD47. This makes it difficult to see how this would translate clinically.

      We greatly appreciate the Reviewer’s interest in the topic and its potential, but our manuscript makes no claims of immediate clinical translation. Chromosomal instability (CIN) studies have to date not yet discovered or described whether and how CIN can affect macrophage function. To our knowledge, this is the first study to begin such characterizations with various MPS1i drugs to induce CIN. Many variations of the approach can be envisioned for future studies.

      Our Results include some key studies of cancer cells with wildtype levels of CD47- including in vivo tumor elimination (Fig.3E). Nonetheless, we do conduct some of our studies in a CD47 knockout context to remove this “brake” that generally impedes phagocytosis, with our goal being to better understand how CIN affects phagocytosis. As cited to some extent in our Introduction, there are many efforts in clinical trials to disrupt this macrophage checkpoint and others focused on macrophage immunotherapy. Whether CIN can be induced by clinically translatable drugs and specifically in cancer cells is beyond the scope of our studies.

      I would like to see the amount of CIN that occurs in WT B16F10 over the course of tumorigenesis (ie longer than 5 days). This is because I would assume that CIN would eventually occur in the WT B16F10 regardless of whether MPS1i is being given. And if that's the case, then the initiation of CIN at day 10 after implantation (for example) would still be considered "early stage" CIN. If the therapy is then initiated at this point, does the effect remain? Or put differently, how would the authors propose to induce the appropriate level of CIN in an established tumor? Why is pretreatment necessary?

      Untreated B16F10 cells fail to produce micronuclei over 12 days compared to MPS1i treated cells – as shown in a newly added panel in Fig. S1:

      Author response image 1.

      This helps support our decision to pre-treat cells with MPS1i to stimulate genomic instability and is described in the first section of Results:

      “…we saw >10-fold increases of micronuclei over the cell line’s low basal level (~1% of cells), and two other MPS1i inhibitors AZ3146 and BAY12-17389 confirm such effects (Fig. S1A). Micronuclei-positive cells can persist up to 12 days after treatment (Fig. S1B), while control cells maintain the low basal levels. The results suggest pre-treatment with MPS1i can simulate CIN in an experimental context even for 1-2 weeks, which may not typically occur at the same frequency during early tumor growth.

      It is known that PD-1 expression inhibits tumor-associated macrophage phagocytosis (Nature, 2017). Does MSP1i (sic) treatment affect the population of PD-1+ tumor macrophages in vivo?

      We thank the Reviewer for bringing up an interesting point.

      Using the same tumor RNA-seq data that was used for Fig.1E, a heatmap of expression of PD-1 (gene Pdcd1) shows no consistent trend with MPS1i:

      Author response image 2.

      We also examined whether the secretome from CIN-afflicted cancer cells affect PD-1 expression in cultured macrophages, but we did not register any reads from our single-cell RNA-sequencing experiment for Pdcd1 in any of the macrophage clusters from Fig. 1H.

      Author response image 3.

      The Discussion section now includes a statement on this topic:

      “…B16F10 tumors are poorly immunogenic, do not respond to either anti-CD47 or anti-PD-1/PDL1 monotherapies, and show modest and variable cure rates (~20-40%; Dooling et al., 2023; Hayes et al., 2023) even when macrophages have been made maximally phagocytic according to notions above. We should note here that our whole-tumor RNA-seq data (Fig.1E) shows expression of PD-1 (gene Pdcd1) follows no consistent trend upon MPS1i treatment, and that Pdcd1 was not detected in our scRNA-seq data for macrophage cultures (Fig.1G) – motivating further study.”

      The authors must explain how the proposed therapy works since MPS1i increases tumor (cell) size, making it difficult for macrophages to phagocytose the tumor cells. It also reduces or suppresses Tyrp1 expression on the cancer cells, making it harder to opsonize. Since these were two main points for the rationale of this study, the authors need to reconcile them.

      We appreciate this comment and have re-organized this Results section to try to minimize confusion:

      CIN-afflicted, CD47-knockout tumoroids are eliminated by Macrophages

      To assess functional effects of macrophage polarization, we focused on a 3D “immuno-tumoroid” model in which macrophage activity can work (or not) over many days against a solid proliferating mass of cancer cells in non-adherent roundbottom wells (Fig. 2A) (Dooling et al., 2023). We used CD47 knockout (KO) B16F10 cells, which removes the inhibitory effect of CD47 on phagocytosis, noting that KO does not perturb surface levels of Tyrp1, which is targetable for opsonization with anti-Tyrp1 (Fig. S2A). BMDMs were added to pre-assembled tumoroids at a 3:1 ratio, and we first assessed surface protein expression of macrophage polarization markers. Consistent with our whole-tumor bulk RNA-sequencing and also single-cell RNA-sequencing of BMDM monocultures (Fig. 1E, 1I-J), BMDMs from immunotumoroids of MPS1i-treated B16F10 showed increased surface expression of M1-like markers MHCII and CD86 while showing decreased expression of M2-like markers CD163 and CD206 (Fig. 2B-C). Although these macrophages seemed poised for anticancer activity, the cancer cells showed decreased binding of anti-Tyrp1 (Fig. S2B) and ~20% larger size in flow cytometry (Fig. S2C). The latter likely reflects cytokinesis defects and poly-ploidy as acute effects of CIN induction (Chunduri & Storchová, 2019; Mallin et al., 2022). Such cancer cell changes might explain why standard 2D phagocytosis assays show BMDMs attached to rigid plastic engulf relatively few anti-Tyrp1 opsonized cancer cells pretreated with MPS1i versus DMSO (Fig. S2D). In such cultures, BMDMs use their cytoskeleton to attach and spread, competing with engulfment of large and poorly opsonized targets. Noting that tumors in vivo are not as rigid as plastic, our 3D immunotumoroids eliminate attachment to plastic, and large numbers of macrophages can cluster and cooperate in engulfing cancer cells in a cohesive mass (Dooling et al., 2023). We indeed find CIN-afflicted tumoroids are eliminated by BMDMs regardless of anti-Tyrp1 opsonization (Fig. 2D-E), whereas anti-Tyrp1 is required for clearance of DMSO control tumoroids (Fig. 2D, S3B). Imaging also suggests that cancer CIN stimulates macrophages to cluster (compare Day-4 in Fig. 2D), which favors cooperative phagocytosis of tumoroids (Dooling et al., 2023), and occurs despite the lack of cancer cell opsonization and their larger cell size. The 3D immunotumoroid results with induced CIN are thus consistent with a more pro-phagocytic M1-type polarization (Fig.1J and 2B,C).

      The authors used varying numbers of tumor cells for the in vivo portions of the study; the first half of the manuscript uses 500,000 cells, while the latter half uses 200,000 cells. Why?

      The reasons for the difference in numbers is now clarified in the Methods:

      For assessing immune infiltrates in early stages of tumor engraftment, when tumors are still small, we used a relatively high number of tumor cells (500,000 cells in Fig. 1D and Fig. 2F-G) to achieve sufficient cell numbers after dissociating the tumors, particularly for the slow-growing MPS1i-treated tumors. More specifically, with dissection, collagenase treatment, passage through a filter to remove clumps, we would lose many cells, and yet needed 100,000 viable cells or more for bulk RNA-seq suspensions and for flow cytometry measurements. For all other studies, 200,000 cancer cells were injected,

      The authors need to report the tumor volumes and the total number of cells isolated from the day five tumors to avoid grossly inflating the effect (i.e. Fig 2G and 4G).

      We have added relevant numbers in the Methods:

      For day 5 post-challenge measurements, 100,000 to 200,000 live cells were collected. For in vivo tumor infiltrate studies in re-challenged mice, 10 million live cells were collected.

      Also, regarding tumor sizes and cell numbers, we have previously published relevant measurements in assessments of tumor growth. Please see:

      Brandon H Hayes, Hui Zhu, Jason C Andrechak, Lawrence J Dooling, Dennis E Discher, Titrating CD47 by mismatch CRISPR-interference reveals incomplete repression can eliminate IgG-opsonized tumors but limits induction of antitumor IgG, PNAS Nexus, Volume 2, Issue 8, August 2023, pgad243, https://doi.org/10.1093/pnasnexus/pgad243

      Dooling, L.J., Andrechak, J.C., Hayes, B.H. et al. Cooperative phagocytosis of solid tumours by macrophages triggers durable anti-tumour responses. Nat. Biomed. Eng 7, 1081–1096 (2023). https://doi.org/10.1038/s41551-023-01031-3

      In the present study, similar tumor growth curves are provided for transparency, but the Kaplan-Meier curves as the key pieces of data in Fig. 3-4. Lastly, regarding reporting total cell number harvested, we based our experiments on previously accepted measurements that also reported numbers out of total harvested cells. See:

      Cerezo-Wallis, D., Contreras-Alcalde, M., … Soengas, M.S., 2020. Midkine rewires the melanoma microenvironment toward a tolerogenic and immune-resistant state. Nat Med 26, 1865–1877. https://doi.org/10.1038/s41591-020-1073-3

      The figure titles need to be revised. For example, the title of Figure 1 claims that "MPS1i-induced chromosomal instability causes proliferation deficits in B16F10 tumors." However, the evidence provided is weak. The authors only present GSEA analysis of proliferation and no functional evidence of impairment. The authors need to characterize this proliferation deficit using in vitro studies and functional studies of macrophage polarization. I would suggest proliferation assays (crystal violet, MTT, Incucyte, etc) to measure the B16 growth over time with MPS1i treatment.

      We thank the Reviewer for pointing this out. In Fig.1 we have minimized information regarding proliferation because it is later quantified in Figs.2D,E, S3, and 3D-i:

      Fig.1F legend: Top downregulated hallmark gene sets in tumors comprised of MPS1i-treated B16F10 cells, showing downregulated DNA repair, cell cycle, and growth-related pathways, consistent with observations of slowed growth in culture and in vivo – as subsequently quantified.

      Then the authors could collect the tumor supernatant to culture with macrophages and determine polarization in vitro. I would also like to see functional studies of macrophage polarization (suppression assays, cytokine production, etc). Currently, the authors provide no functional studies.

      Fig.2B,C provides functional surface marker measurements of in vitro polarization toward anti-cancer M1 macrophages by MPS1i-pretreated tumor cells, consistent with gene expression in Fig.1G-J. Function is further shown as ant-cancer activity in Fig.2D,E, as now stated explicitly in the text:

      “…In our 3D tumoroid in vitro assays, we found that macrophages can suppress the growth of chromosomally unstable tumoroids and clear them, surprisingly both with and without anti-Tyrp1 (Fig. 2D-E), regardless of MPS1i concentration used for treatment. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more pro-phagocytic. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more prophagocytic.”

      The authors claim that macrophages are the key effector cells, but they need to provide evidence for this claim.

      Other immune cells clearly contribute to the presented results because the IgG must eventually come from B cells. The text has been edited to indicate 'macrophages are key initiating-effector cells', and some evidence for this is the maximal survival of (WT B16 + Rev tumors) in Fig.3E upon treatment with Marrow Macrophages plus Macrophage-relevant SIRPa blockade and Macrophage-relevant IgG (via FcR). T cells do not have SIRPa or FcR.

      They can deplete macrophages and T and B cells to determine whether the effect remains or is ablated. This is the only definitive way to make this claim.

      To determine whether T and B cells might also be key initiating-effector cells, new experiments were done with mice depleted of T and B cells (per Fig.S9, below). We compared the growth of MPS1i vs DMSO treatments in these mice to results in mice with T and B cells (which should replicate our previous results in Fig.3D-i). We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells compared to mice with T and B cells. We have added to the text our conclusion that: T and B cells are not key initiating-effector cells. Whereas B cells are effector cells at least in terms of eventually making anti-tumor IgG, our results show that macrophages are key initiating-effector cells because macrophages certainly affect the growth of (WT B16 + Rev tumors) when more are added (Fig.3E).

      Author response image 4.

      Growth of CIN-afflicted wild-type (WT) tumors in T- and B-cell deficient mice and T- and B-cell replete mice. Similar growth delays for MPS1i-pretreated B16F10 cells in T- and B-cell deficient NSG mice and immunocompetent C57BL/6 mice. Both types of mice have functional macrophages. Parallel studies in vivo were done with WT B16F10 ctrl cells cultured 24 h in 2.5 μM MPS1i (reversine or DMSO, then washed 3x in growth media for 5 min each and allowed to recover in growth media for 48 h. 200,000 cells in 100 uL PBS were injected subcutaneously into right flanks, and the standard size limit was used to determine survival curves. The C57BL/6 experiments were done independently here (by co-author L.J.D.) from the similar results (by B.H.H.) shown in Fig.3D-i, which provides evidence of reproducibility.

      The Results section final paragraph describes all of this:

      Macrophages seem to be the key initiating-effector cells, based in part on the following findings. First, macrophages with both SIRPα blockade and FcR-engaging, tumor-targeting IgG maximize survival of mice with WT B16 + Rev tumors (Fig. 3E) – noting that macrophages but not T cells express SIRPα and FcR’s. Despite the clear benefits of adding macrophages, to further assess whether T and B cells are key initiating-effector cells, new experiments were done with mice depleted of T and B cells. We compared the growth delay of MPS1i versus DMSO treatments in these mice to the delay in fully immunocompetent mice with T and B cells – with all studies done at the same time. We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells when compared to immunocompetent C57 mice (Fig.S9). We conclude therefore that T and B cells are not key initiating-effector cells. At later times, B cells are likely effector cells at least in terms of making anti-tumor IgG, and T cells in tumor re-challenges are also increased in number (Fig. 4G-ii). We further note that in our earlier collaborative study (Harding et al., 2017) WT B16 cells were pre-treated by genome-damaging irradiation before engraftment in C57 mice, and these cells grew minimally – similar to MPS1i treatment – while untreated WT B16 cells grew normally at a contralateral site in the same mouse. Such results indicate that T and B cells in C57BL/6 mice are not sufficiently stimulated by genome-damaged B16 cells to generically impact the growth of undamaged B16 cells.

      Reviewer #2 (Public Review):

      Harnessing macrophages to attack cancer is an immunotherapy strategy that has been steadily gaining interest. Whether macrophages alone can be powerful enough to permanently eliminate a tumor is a high-priority question. In addition, the factors making different tumors more vulnerable to macrophage attack have not been completely defined. In this paper, the authors find that chromosomal instability (CIN) in cancer cells improves the effect of macrophage targeted immunotherapies. They demonstrate that CIN tumors secrete factors that polarize macrophages to a more tumoricidal fate through several methods. The most compelling experiment is transferring conditioned media from MSP1 inhibited and control cancer cells, then using RNAseq to demonstrate that the MSP1-inhibited conditioned media causes a shift towards a more tumoricidal macrophage phenotype. In mice with MSP1 inhibited (CIN) B16 melanoma tumors, a combination of CD47 knockdown and anti-Tyrp1 IgG is sufficient for long term survival in nearly all mice. This combination is a striking improvement from conditions without CIN.

      Like any interesting paper, this study leaves several unanswered questions. First, how do CIN tumors repolarize macrophages? The authors demonstrate that conditioned media is sufficient for this repolarization, implicating secreted factors, but the specific mechanism is unclear. In addition, the connection between the broad, vaccination-like IgG response and CIN is not completely delineated. The authors demonstrate that mice who successfully clear CIN tumors have a broad anti-tumor IgG response. This broad IgG response has previously been demonstrated for tumors that do not have CIN. It is not clear if CIN specifically enhances the anti-tumor IgG response or if the broad IgG response is similar to other tumors. Finally, CIN is always induced with MSP1 inhibition. To specifically attribute this phenotype to CIN it would be most compelling to demonstrate that tumors with CIN unrelated to MSP1 inhibition are also able to repolarize macrophages.

      Overall, this is a thought-provoking study that will be of broad interest to many different fields including cancer biology, immunology and cell biology.

      We thank the Reviewer for their enthusiastic and positive comments toward the manuscript.

      Our main purpose with this study has been discovery science oriented and mechanistic, with implications for improving macrophage immunotherapies. More experimentation needs to be done to further understand how this positive immune response emerges. However, we could address whether CIN enhances or not the anti-tumor IgG response by quantitative comparisons to our two other recent studies, and we conclude that it does not per new edits in the Abstract and the Results. See attached PPT for full details and comparison.

      Abstract:

      “CIN does not greatly affect the level of the induced response but does significantly increase survival.”

      “…these results demonstrate induction of a generally potent anti-cancer antibody response to CIN-afflicted B16F10 in a CD47 KO context. Importantly, comparing these sera results for CINafflicted tumors to our recent studies of the same tumor model without CIN (Dooling et al., 2022; Hayes et al., 2022), we find similar levels of IgG induction (e.g. ~100-fold above naive on average for IgG2a/c), similar increases in phagocytosis by sera opsonization (e.g. equivalent to antiTyrp1), and similar levels of suppressed tumoroid growth – including the variability.

      However, median survival increased (21 days) compared to their naïve counterparts (14 days), supporting the initial hypothesis of prolonged survival and consistent not only with past results indicating major benefits of a prime-&-boost approach with anti-Tyrp1 (Dooling et al., 2022) but also with the noted similarities in induced IgG levels.”

      Future studies could certainly focus on trying to identify what secreted factors might be inducing the M1-like polarization (using ELISA assays for cytokine detection, for example). This could be important because a main finding here is that we achieve nearly a 100% success rate in clearing tumors when we combine CD47 ablation and IgG opsonization with cancer cell CIN. Previous studies were only able to achieve about 40% cures in mice when working with CD47 disription and IgG opsonization alone, suggesting CIN in this experimental context does improve macrophage response.

      Lastly, we agree with the Reviewer that future studies should also address how CIN in general (not MPS1i-induced) affects tumor growth. The final paragraph of our Discussion at least cites support for consistent effects of M1-like polarization:

      “The effects of CIN and aneuploidy in macrophages certainly requires further investigation. We did publish recently that M1-like polarization of BMDMs with IFNg priming is sufficient to suppress growth of B16 tumoroids with anti-Tyrp1 opsonization more rapidly than unpolarized/unprimed macrophages and much more rapidly than M2-like polarization of BMDMs with IL4 (Extended Data Fig.5a in Dooling et al., 2023); hence, anti-cancer polarization contributes in this assay.

      While the secretome from MPS1i-treated cancer cells has been found to trigger…”

      Nonetheless, we can only speculate that there is a threshold of CIN reached by a certain timepoint in tumor engraftment and growth. Natural CIN might not be enough, so we pursued a pharmacological approach consistent with ongoing pre-clinical studies (https://doi.org/10.1158/1535-7163.MCT-15-0500). Future studies should consider trying knockdown models to gradually accrue CIN in tumors or using more relevant pharmacological drugs that are known to induce CIN not associated with the spindle. We believe, however, that these are larger questions on their own and are beyond the scope of the foundational discoveries in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None

      We again thank the Reviewer for their support and enthusiasm for the manuscript. We made some additional changes and more data to address questions posed by the other Reviewer that we hope you find to help the manuscript further.

    1. Author Response:

      We sincerely value the insightful and constructive feedback provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. Below are our responses detailing each comment.

      Reviewer 1:

      (1) One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      (1) In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      (2) Located in the inner membrane of noradrenergic and adrenergic neurons, DBH (Dopamine-beta-hydroxylase) is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). As reviewer said, it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent. As suggested, we will quantify the labeling efficiency of both DBH and TH-cre throughout the revised manuscript. This updated figure will provide a more rigorous analysis.

      (2) A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we acknowledge that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. In the revised manuscript, we are considering conducting more restricted AAV injections into the VLPO to verify terminal expressions in the LC.

      (3) Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      As suggested, we will supplement the multi-label ISH of LC NE downstream neurons in the VLPO to reveal the types of neurons they modulate.  

      (4) In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      (1) As recommended, we will supplement error bar in the revised manuscript.

      (2) As reviewer suggested, the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we will directly determine central norepinephrine content.

      (5) Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      (1) In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We will explain the Ca2+ recording experiment more specifically in the revised manuscript.

      (2) To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We will carefully check and correct similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we will supplement abbreviations in figure as much as possible in the revised manuscript.

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. Author response:

      Reviewer 1:

      A limit of the paper is that the biological mechanisms by which intracellular mechanics is modulated (e.g. among cell types) remains unexplored and only briefly discussed. Yet this limit is greatly offset by the rigor of the approach.

      We thank the reviewer for the valuable feedback. The question regarding the biological mechanisms responsible for the different mechanical properties is, indeed, a highly important and interesting issue. In line with the reviewer, we consider this so important that it requires an extra, dedicated research focus, which is far beyond the scope of this article. By introducing the concept of the mechanical fingerprint, we provide in this work the framework to systematically investigate biological mechanisms but also the functional relevance of the intracellular mechanical properties in future studies. In the revised manuscript, we’ll elaborate on the discussion.

      Reviewer 2:

      The most difficult part of the method is the part with actin polymerization inhibition with cytochalasin B. The data shows that viscoelastic parameters as well as active energy parameters are unaffected by cytochalasin B. It is reasonable to expect that elasticity will reduce and fluidity will increase upon application of such a drug. The stiffness-reducing effect was observed only when CB was used with nocodazole most likely because of phagocytosis of the bead, which is governed by microtubule. The use of other actin-depolymerizing drugs such as latrunculin A would be needed to test actin’s role in mechanical fingerprints. If actin’s role is only explained by accompanying microtubule inhibition, it is not a convenient system to directly test the mechano-adaptation process.

      We thank the reviewer for the time and the instructive feedback. Our finding that actin depolymerization has no effect on the intracellular mechanics may appear unfamiliar, as many rheological studies performed on the cell’s cortex highlight the importance of actin on the mechanical properties of the whole cell. However, as the actin network is reported to be very sparse away from the cortex it is not impossible that the mechanical properties may be dominated by other structures in the cytoplasm. Indeed, our findings are consisted with other studies that see no strong effect of actin depolymerization on the interphase intracellular mechanics (e.g. https://doi.org/10.1016/j.bpj.2023.04.011 or https://doi.org/10.1038/s41567-021-01368-z). Still, we fully agree with the reviewers that this is an important point. In a revised version we aim to investigate the effect of other actin-depolymerizing drugs and will try to perform immunostaining to visualize and further illuminate the potential compensation mechanism between actin and MT.

      Depolymerization of MT with nocodazole did not reduce the solid-like property A. Adding discussion and comparison with other papers in the literature using nocodazole will be helpful in understanding why.

      Again, we agree with the reviewer and propose to further study this point by performing additional immunostainings and by elaborating on the discussion, also including the results of other studies.

      Reviewer 3:

      The importance of the mechanical fingerprint is diluted due to some missing controls needed for biological relevance.

      We thank the reviewer for his valuable time and feedback. This comment is in line with the point already raised by reviewer 1 and highlights the important question of how the intracellular mechanical properties are related to the actual cell function. We fully agree with the reviewers that at this point we can only report on differences, but cannot claim a biological function that is depending on the fingerprint. Although we think the alignment between function and the mechanical fingerprints allows the hypothesis that the biological system is tuning its mechanical properties for a specific function, we do not want to make any claim in this direction at the current state of our research. Hence, to answer these intriguing questions, carefully designed control experiments are required, as pointed out by the reviewer. However, this direction is not the scope of this manuscript. Here, we establish the tools we’ll use in future studies to address these highly relevant questions. Therefore, we propose to discuss these important future directions in a revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early- onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in nighttime sleep loss, consistent with the well-supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in nighttime sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are nighttime sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause nighttime sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–suppl. 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts:

      – Four indications are shared by 4/7 knockouts: “mydriasis” (dilated pupils, significant for psen1, apoea/apoeb, cd2ap, clu); “fragile X syndrome” (psen1, apoea/apoeb, cd2ap, sorl1), “insomnia” (psen2, apoea/apoeb, cd2ap, sorl1); “malignant essential hypertension” (appa/appb, psen1, apoea/apoeb, cd2ap).

      – Two targets are shared by 5/7 knockouts: “glycogen synthase kinase−3 alpha” (psen1, apoeab, cd2ap, clu, sorl1) and “neuronal acetylcholine receptor beta−2” (appa/appb, psen1, apoeab, cd2ap, clu).

      – Two KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1) and “nitrogen metabolism” (appa/appb, psen1, psen2, cd2ap, clu).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. All three are also significant for psen2 knockouts, but none others. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      While perhaps not surprising, we find reassuring that insomnia appears in the indications shared by the largest number of knockouts. apoea/apoeb, cd2ap, sorl1 also happen to be the knockouts with the largest loss in night-time sleep.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on Alzheimer’s disease. For example, the first four drugs ever approved by the FDA to treat Alzheimer’s disease were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in Alzheimer’s disease.

      We see that literature also exists on the involvement of glycogen synthase kinase-3 in AD (Lauretti et al., 2020). We plan to explore further these predictions in a future study.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (Myers-Turnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR will seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      5-HT receptor type 4a is another candidate as it was shown to interact with sorting nexin 27, a subunit of retromer (Joubert et al., 2004). We see that antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested, and in our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Despite these important considerations, this study provides a valuable platform for high-throughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that nighttime sleep is disrupted in mutants for multiple late-onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere in-between: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, with tolerable secondary effects, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature. First, Hoffman et al. (2016) found that drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint are enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram treatment (Fig. 5c,d), cntnap2a/cntnap2b knockout larvae were found to be overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae also had fewer GABAergic neurons in the forebrain. Second, Ashlin et al. (2018) found that the fingerprint of pitpnc1a knockout larvae clustered with anti-inflammatory compounds. Flumethasone, an anti-inflammatory corticosteroid, caused a lower increase in activity when added to knockout larvae compared to wild-type larvae. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      Related to your next point, we may reduce the discussion on sorl1 and serotonin and add some of the present arguments instead, depending on the results from  testing a second SSRI (see next point).

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose. However, the result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 larvae.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relies on this result being robust. As you and reviewer #3 suggest, we plan on repeating this experiment with a different serotonin reuptake inhibitor (SSRI). If the other SSRI also shows a differential effect, this should strengthen the claim that ZOLTAR correctly predicted serotonin signalling as being affected by the loss of Sorl1, even if we did not discover the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we will adapt the wording to make it clear this is one possible alternative hypothesis which could be tested in the future, rather than the only alternative.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutch-to-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more details on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of Alzheimer’s disease. Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae, but the ideal approach would be to study zebrafish larvae with an in-frame deletion in the Aβ sequence within appa/appb.

      We will adapt the language to address your point. We would not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed.

      A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–suppl. 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we do not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We will improve this in our revised version.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant. 

      b. psen2KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null,  given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–suppl. 2). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer 2, we are planning to assemble a paper that expressly examines F0s vs. stable mutants to allay some of these concerns. We would also suggest that our current manuscript, which combines CRISPR-F0 rapid screening with in silico pharmacological predictions, ultimately represents a first step in characterizing the functions of genes.

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is great clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e., from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g., pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings, as the clutch-to-clutch variability we and others (Joo et al., 2021) observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but will decrease the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four clutches as sorl1 were also tested in the citalopram experiment (Fig. 5), and psen2 was also tested in the small molecule rescue experiment (Fig. 6 and Fig. 6–suppl. 1). In the citalopram experiment, one H2O-treated sorl1 knockout clutch (n = 10) replicates fairly well the baseline recordings in Fig. 4–suppl. 5, the other does not but had especially low sample size (n = 6).

      We also plan to test another SSRI on sorl1 knockouts, so this point will be addressed.

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a pretty minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim, at least not from expression. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,”

      which does not mention future risk of Alzheimer’s disease. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seems reasonable. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in infants and children at high genetic risk of Alzheimer’s disease (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but  our choice of labelling may be misleading. We will try to test this claim by plotting the distribution of the inactive period durations. If psen2 knockout larvae indeed sleep more during the day compared to controls, we might predict that inactive periods longer than 1 minute to increase disproportionately compared to the increase in shorter inactive periods.

      To address, “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we can try to look at the behaviour of psen2 knockout larvae that were awake (i.e., moved in the preceding one minute) or ‘asleep’ (i.e., did not move in the preceding one minute) at the light transitions and count the proportion of psen2 knockout or control larvae which displayed a startle response. If most psen2 knockouts react to the light transition, it should at least exclude the concern that they are very unhealthy, as the reviewer suggests. This criticism seems challenging to definitely address experimentally though. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus which is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Note, how to calibrate this stimulus is also not straightforward. We do not plan to test this, but our analysis of the light transitions may provide a decent proxy.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just 1 drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly your question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–suppl. 5). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts but that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of Z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we would concede that the experiment does not necessarily says much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with you and Reviewer #2 that the conclusions are overly confident. We will repeat this experiment with another SSRI as you suggest. Your suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      References:

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin-like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and Late-Onset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, Roontiva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the Autism Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A., Ruitenberg Annemieke, Hofman Albert, Launer Lenore J., van Duijn Cornelia M., Stijnen Theo, Breteler Monique M.B., Stricker Bruno H.C. 2001. Nonsteroidal Antiinflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbett BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Lauretti E, Dincer O, Praticò D. 2020. Glycogen synthase kinase-3 signaling in Alzheimer’s disease. Biochim Biophys Acta Mol Cell Res 1867:118664. doi:10.1016/j.bbamcr.2020.118664

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a re-examination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476–E485. doi:10.1073/pnas.1618657114

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.tips.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Datta SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Author response:

      We thank the reviewers for their efforts. They have pointed out several shortcomings and made very helpful suggestions. Below, we shortly address the weak points that the reviewers brought up and outline what improvements we intend to make for the revised paper in response.

      Reviewer #1:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

      The results of our experimentation with the number of layers and the number of units in each layer can be found in the supplementary information. In the revised version, we will bring some of these results into the main text and discuss them more thoroughly.

      Reviewer #2:

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      We very much agree with the reviewer that a qualitative analysis of whether the model can explain experimental effects needs to happen before a quantitative analysis, such as evaluating model-brain correlation scores. In fact, this is one of the key points we wished to make.

      This starts with the observation that "traditional" models of reading (=those that do not rely on deep learning) cannot explain some very basic human behavioral results, such as humans being able to recognize a word regardless of exact letter shape, size, and (up to a point) rotation. This is not so much a failure on the part of traditional models as it is a difference in focus. There are models of vision that focus on these low-level things, currently dominated by deep learning, but these are rarely evaluated in the context of reading, which has its own literature and well-known experimental effects. We believe the current version of the manuscript makes insufficiently clear what the goals of our modeling effort are exactly, which is something we will attempt to correct in the revision.

      Since our model only covers the first phase of reading, with a special focus on letter shape detection, we sought to compare it with neuroimaging data that can provide "snapshots" of the state of the brain during these early phases, rather than comparing it with behavioral results that occur at the very end. However, we very much make this comparison in the spirit hinted at by the reviewer. The different MEG components have a distinct "behavior" to them in the way they respond to different experimental conditions (Figure 2), and the model needs to replicate this behavior (Figure 4). Only then do we move on to a quantitative analysis.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      We are glad the reviewer brought up the topic of frequency balancing, as it is a good example of the importance of the qualitative analysis. As the reviewer points out, frequency balancing during training only had a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing had a large impact. It is true that the model, even with frequency balancing, only captures letter- and bigram-frequency effects and not word-frequency effects, as we know the N400 is sensitive to. This could mean that N400 word-frequency effects are driven by mechanics that our current model lacks, such as top-down effects from systems further up the processing pipeline.

      We agree with the reviewer that the late-stage sensitivity of the model to font size must be seen as a flaw. Of course, we say as much when we discuss this result in the paper. Important context for this flaw is that the main aim of the model is to reproduce the experimental effects of Vartiainen et al. (2011), which does not include manipulation of word length. The experimental contrasts in Figure 7 are meant to explore a bit beyond the boundaries of that particular study, but were never considered "failure points". When presenting a model, it's important to show its limitations too.

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      In this study, we make a start in showing how deep learning techniques could be beneficial to enhance models of reading by showing how even a simple CNN, after a few enhancements, can account for several experimental MEG effects that we see in reading tasks, but are outside the focus of traditional models of reading. We never intended to claim that our model offers a complete view of all the processes involved. This is why we have dedicated a section in the Discussion to the various ways in which our simple CNN is incomplete as a model of reading. In this section we hint at the usage of recurrent connections, but the reviewer does an excellent job of highlighting the importance of top-down connections even in models focusing on early visual processes, which we are very happy to include in this section.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

      The CNN model we present in this study is a small piece in a bigger effort to employ deep learning techniques to further enhance already existing models of reading. For our revision, we plan to expand on the question of where to go from here and outline our vision on how these techniques could help us better model the phenomena the reviewer speaks of. We agree with the reviewer that there is a long way to go, and we are excited to be a part of it.

      Reviewer #3:

      The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      The large focus on a qualitative evaluation of the model is intentional. The ability of the model to reproduce experimental effects (Figure 4) is a pre-requisite for any subsequent qualitative metrics (such as correlation) to be valid. The introduction of frequency balancing is a good example of this. As the reviewer points out, frequency balancing during training has only a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing has a large impact.

      That said, the reviewer is right to highlight the value of quantitative analysis. An important limitation of the "traditional" models of reading that do not employ deep learning is that they operate in unrealistically simplified environments (e.g. input as predefined line segments, words of a fixed length), which makes a quantitative comparison with brain data problematic. The main benefit that deep learning brings may very well be the increase in scale that makes more direct comparisons with brain data possible. In our revision we will attempt to capitalize on this benefit more. The reviewer has provided some helpful suggestions for doing so in their recommendations.

      The experiments only consider a rather outdated vision model (VGG).

      VGG was designed to use a minimal number of operations (convolution-and-pooling, fully-connected linear steps, ReLU activations, and batch normalization) and rely mostly on scale to solve the classification task. This makes VGG a good place to start our explorations and see how far a basic CNN can take us in terms of explaining experimental MEG effects in visual word recognition. However, we agree with the reviewer that it is easy to envision more advanced models that could potentially explain more. For our revision, we plan to expand on the question of where to go from here and outline our vision on what types of models would be worth investigating and how one may go about doing that in a way that provides insights beyond higher correlation values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support a development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain unclear.

      Public Reviews:

      Reviewer #1 (Public Review):

      General comments:

      This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.

      The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.

      Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

      Reviewer #2 (Public Review):

      Summary:

      In this study of the mouse homolog of acidic mammalian chitinase, the overall goal is to provide a mechanistic explanation for the unusual observation of two pH optima for the enzyme. The study includes biochemical assays to establish kinetic parameters at different solution pH, structural studies of enzyme/substrate complexes, and theoretical analysis of amino acid side chain pKas and molecular dynamics.

      Strengths:

      The biochemical assays are rigorous and nicely complemented by the structural and computational analysis. The mechanistic proposal that results from the study is well rationalized by the observations in the study.

      Weaknesses:

      The overall significance of the work could be made more clear. Additional details could be provided about the limitations of prior biochemical studies of mAMC that warranted the kinetic analysis. The mouse enzyme seems unique in terms of its behavior at high and low pH, so it remains unclear how the work will enhance broader understanding of this enzyme class. It was also not clear can the findings be used for therapeutic purposes, as detailed in the abstract, if the human enzyme works differently.

      We have edited the paper to address these concerns

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Regarding the pH profiles of mouse AMCase, previous studies have reported its activity at pH 2.0 and within the pH range of 3-7. In this paper, the authors conducted kinetic measurements and showed that pH 6.5 is optimal for kcat/Km. The authors emphasize the significance of mouse AMCase's activity in the neutral region, particularly at pH 6.5, for understanding its physiological relevance in humans. To provide a comprehensive overview, it would be valuable for the authors to summarize the findings from previous and current studies, discuss their implications for future pulmonary therapy in humans, and cite relevant literature. Additionally, the authors should highlight their research's specific contributions and novel findings, such as the determination of kinetic parameters (Kcat and Km) under different pH conditions. Emphasizing why previous studies may have required these observations and underscoring the importance of the present findings in addressing those knowledge gaps will help readers understand the significance of the study and its impact on the field of enzymology.

      We thank the reviewer for this comment. In keeping with the knowledge gaps addressed directly by this paper, we have not augmented the discussion of future pulmonary therapy in humans. We have summarized the present findings at the end of the introduction as follows:

      “We measured the mAMCase hydrolysis of chitin, which revealed significant activity increase under more acidic conditions compared to neutral or basic conditions. To understand the relationship between catalytic residue protonation state and pH-dependent enzyme activity, we calculated the theoretical pKa of the active site residues and performed molecular dynamics (MD) simulations of mAMCase at various pHs. We also directly observed conformational and chemical features of mAMCase between pH 4.74 to 5.60 by solving X-ray crystal structures of mAMCase in complex with oligomeric GlcNAcn across this range.”

      (2) Regarding the implications of the pKa values and Asp138 orientation for the pH optima, it would be valuable for the authors to discuss the variations in optimal activity by pH among GH-18 chitinases and investigate the underlying factors contributing to these differences. In particular, exploring the role of Asp138 orientation in chitotriosidase, another mammalian chitinase, would provide important insights. Chitotriosidase is known to be inactive at pH 2.0, and it would be interesting to investigate whether the observed orientation of Asp138 towards Glu140 in mouse AMCase for pH 2.0 activity is lacking in chitotriosidase.

      There are similar rotations of the two acidic residues in the literature on Chit1. The variety of crystal pH conditions and the lack of a straightforward mechanism for pKa shifts in AMCase make it difficult to draw a comparison to why Chit1 is inactive at low pH, but this is an interesting area for future study. See a more full discussion in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760363/

      Furthermore, considering the lower activity of human AMCase at pH 2.0, it would be worthwhile to examine whether the Asp138 orientation towards Glu140, as observed in mouse AMCase, is also absent in human AMCase. Exploring this aspect will help determine if the orientation of Asp138 plays a critical role in pH-dependent activity in human AMCase.

      The situation for hAMCase is similar to Chit1 as the rotations observed here for mAMCase are also present. It is not the whether Asp138 can rotate, but rather the relevant energetic penalties as we discuss in the manuscript.

      (3) In a previous study by Okawa et al.(Loss and gain of human acidic mammalian chitinase activity by nonsynonymous SNPs. Mol Biol Evol 33, 3183-3193, 2016), it was reported that specific amino acid substitutions (N45D, D47N, and R61M) encoded by nonsynonymous single nucleotide polymorphisms (nsSNPs) in the N-terminal region of human AMCase had distinct effects on its chitinolytic activity. Introducing these three residues (N45D, D47N, and R61M) could activate human AMCase. This activation significantly shifted the optimal pH from 4-5 to 2.0.

      Considering the significant impact of these amino acid substitutions on the pH-dependent activity of human AMCase, the authors should discuss this point in the manuscript's discussion section. Incorporating the findings and relating them to the current study's observations on pH optima and Asp138 orientation can provide a comprehensive understanding of the factors influencing pH-dependent activity in AMCase.

      We added a citation and dicuss how the mutations identified by this study could potentially shift the pKa of key catalytic residues:

      “Okawa et al identified how primate AMCase lost activity by integration of specific, potentially pKa-shifting, mutations relative to the mouse counterpart42b.”

      (4) To further strengthen the discussion, the authors could explore the ancestral insectivorous nature of placental mammals and the differences in chitinase activity between herbivorous and omnivorous species. Incorporating these aspects would add depth and relevance to the overall discussion of AMCase. AMCase is an enzyme known for its role in digesting insect chitin in the stomachs of various insectivorous and omnivorous animals, including bats, mice, chickens, pigs, pangolins, common marmosets, and crab-eating monkeys 1-7. However, in certain animals, such as dogs (carnivores) and cattle (herbivores), AMCase expression and activity are significantly low, leading to impaired chitin digestion 8. These observations suggest a connection between dietary habits and the expression and activity of the AMCase gene, ultimately influencing chitin digestibility across different animal species 8.

      (1) Strobelet al. (2013). Insectivorous bats digest chitin in the stomach using acidic mammalian chitinase. PloS one 8, e72770.

      (2) Ohno et al. (2016). Acidic mammalian chitinase is a proteases-resistant glycosidase in mouse digestive system. Sci Rep 6, 37756.

      (3) Tabata et al. (2017). Gastric and intestinal proteases resistance of chicken acidic chitinase nominates chitin-containing organisms for alternative whole edible diets for poultry. Sci Rep 7, 6662.

      (4) Tabata et al. (2017). Protease resistance of porcine acidic mammalian chitinase under gastrointestinal conditions implies that chitin-containing organisms can be sustainable dietary resources. Sci Rep 7, 12963.

      (5) Ma et al. (2018). Acidic mammalian chitinase gene is highly expressed in the special oxyntic glands of Manis javanica. FEBS Open Bio 8, 1247-1255.

      (6) Tabata et al. (2019). High expression of acidic chitinase and chitin digestibility in the stomach of common marmoset (Callithrix jacchus), an insectivorous nonhuman primate. Sci. Rep. 9. 159.

      (7) Uehara et al. (2021). Robust chitinolytic activity of crab-eating monkey (Macaca fascicularis) acidic chitinase under a broad pH and temperature range. Sci. Rep. 11, 15470.

      (8) Tabata et al. (2018). Chitin digestibility is dependent on feeding behaviors, which determine acidic chitinase mRNA levels in mammalian and poultry stomachs. Sci Rep 8, 1461.

      This overall point is covered by our brief discussion on diet differences:

      “However, hAMCase is likely too destabilized at low pH to observe an increase in _k_cat. hAMCase may be under less pressure to maintain high activity at low pH due to humans’ noninsect-based diet, which contains less chitin compared to other mammals with primarily insect-based diets42. “

      (5) It is important for the authors to clearly state the limitations of their simulations and emphasize the need for experimental validation or additional supporting evidence. This will provide transparency and enable readers to understand the boundaries of the study's findings. A comprehensive discussion of limitations would contribute to a more robust interpretation of the results.

      We added a sentence to the discussion:

      “Our simulations have important limitations that could be overcome by quantum mechanical simulations that allow for changes in protonation state and improved consideration of polarizability.”

      Minor comments:

      (1) Regarding the naming of AMCase, it is important to accurately describe it based on its acidic isoelectric point rather than its enzymatic activity under acidic conditions based on the original paper (Reference #14 (Boot, R. G. et al. Identification of a novel acidic mammalian chitinase distinct from chitotriosidase. J. Biol. Chem. 276, 6770-6778 (2001)).

      We have made this modification

      (2) In the introduction, providing more context regarding the terminology of acidic mammalian chitinase (AMCase) would be beneficial. While AMCase was initially discovered in mice and humans, subsequent research has revealed its presence in various vertebrates, including birds, fish, and other species. Therefore, it would be appropriate to include the alternative enzyme name, Chia (chitinase, acidic), in the introduction to reflect its broader distribution across different organisms. This clarification would enhance the readers' understanding of the enzyme's taxonomy and facilitate further exploration of its functional significance in diverse biological systems.

      We have made this modification

      (3) The authors mention that AMCase is active in tissues with neutral pHs, such as the lung. However, it is important to consider that the pH in the lung is lower, around 5, due to the presence of dissolved CO2 that forms carbonic acid. The lung microenvironment is known to vary, and specific regions or conditions within the lung may have slightly different pH levels. By addressing the pH conditions in the lungs and their relationship to AMCase's activity, the authors can enhance our understanding of the enzyme's function within its physiological context. A thorough discussion of the specific pH conditions in the lung and their implications for AMCase's activity would provide valuable insights into the enzyme's role in lung pathophysiology.

      To keep the focus on the insights we have made, we have elected not to expand this discussion.

      (4) It would be helpful for the authors to provide more information about the substrate or products of AMCase. The basic X-ray crystal structures used in this study are GlcNAc2 or GlcNAc3, known products of AMCase. Including details about the specific ligands involved in the enzymatic reactions would enhance the understanding of the study's focus.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of substrates here.

      (5) The authors should critically evaluate the inclusion of the term "chitin-binding" in the Abstract and Introduction. Suppose substantial evidence or discussion regarding the specific chitin-binding properties of the enzyme or its relevance to the immune response needs to be included. In that case, removing or modifying that statement might be appropriate.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of “chitin-binding” here.

      (6) The authors developed an endpoint assay to measure the activity of mouse AMCase across a broad pH range, allowing for direct measurement of kinetic parameters. The authors should provide a more detailed description of the methods used, including any specific modifications made to the previous assay, to ensure reproducibility and facilitate further research in the field. It is important to clearly show the novelty of their endpoint assay compared to previous methods employed in other reports. The authors should also explain how their modified endpoint assay differs from existing assays and highlight its advancements or improvements. This will help readers understand the unique features and contributions of the assay in the context of previous methods.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (7) The authors suggest that mouse AMCase may be subject to product inhibition, potentially due to its transglycosylation activity, which can affect the Michaelis-Menten model predictions at high substrate concentrations. However, the reviewer needed help understanding the specific impact of transglycosylation on the kinetic parameters. It would be helpful for the authors to provide a more appropriate and detailed explanation, clarifying how transglycosylation activity influences the kinetic behavior of AMCase and its implications for the observed results.

      The experiments to conclusively demonstrate this are beyond our current capabilities.

      (8) In the Abstract, the authors state, "We also solved high resolution crystal structures of mAMCase in complex with chitin, where we identified extensive conformational ligand heterogeneity." This reviewer suggests replacing "chitin" with "oligomeric GlcNAcn" throughout the text, specifically about biochemical experiments. It is important to accurately describe the experimental conditions and ligands used in the study.

      We have made these changes throughout the manuscript

      (9) In the introduction, the authors mention "a polymer of β(1-4)-linked N-acetyl-D-glucosamine (GlcNAc)". In this case, the letter "N" should be italicized to conform to the proper notation for the monosaccharide abbreviation.

      corrected (and hopefully would have been done so by the copy editor!)

      (10) In the introduction, the authors state, "In the absence of AMCase, chitin accumulates in the airways, leading to epithelial stress, chronic activation of type 2 immunity, and age-related pulmonary fibrosis5,6". It is recommended to clarify that "AMCase" refers to "acidic mammalian chitinase (AMCase)" in this context, as it is the first mention of the enzyme in the introduction.

      We moved that section so that it flows better and is introduced with the full name.

      (11) In the introduction, the authors state, "Mitigating the negative effects of high chitin levels is particularly important for mammalian lung and gastrointestinal health." This reviewer requests further clarification on the connection between chitin and gastrointestinal health. Please provide an explanation or reference to support this statement.

      We have modified this sentence to:

      “Chitin levels can be potentially important for mammalian lung and gastrointestinal health.”

      (12) In the introduction, the authors mention that "Acidic Mammalian Chitinase (AMCase) was originally discovered in the stomach and named for its high enzymatic activity under acidic conditions." It is recommended to include Reference #14 (Boot et al. J. Biol. Chem. 276, 6770-6778, 2001) as it provides the first report on mouse and human AMCase, contributing to the understanding of the enzyme.

      However, it is worth noting that while this paragraph primarily focuses on human tissues, Reference #14 primarily discusses mouse AMCase but also reports on human AMCase. Additionally, References #8 and #9 mainly discuss mouse AMCase. This creates confusion in the description of human and mouse AMCase within the paragraph.

      Considering that this paper aims to focus on the unique features of mouse AMCase, it is suggested that the authors provide a more specific and balanced description of both human and mouse AMCase throughout the main text..

      We have clarified the origin of the name AMCase and the results distinguish the two orthologs in the text with h or mAMCase.

      (13) Figure 1A in the Introduction section has been previously presented in several papers. The authors should consider moving this figure to the Results section and present an alternative figure based on their experimental results to enhance the novelty and impact of the study.

      We have considered this option, but prefer the original placement.

      (14) In the Results section, the authors mentioned, "Prior studies have focused on relative mAMCase activity at different pH18,20, limiting the ability to define its enzymological properties precisely and quantitatively across conditions of interest." It would be beneficial for the authors to include reference #14, the first report showing the pH profile of mouse AMCase, to support their statement.

      We have added this reference

      (15) Regarding the statement, "To overcome the pH-dependent fluorescent properties of 4MU-chitobioside, we reverted the assay into an endpoint assay, which allowed us to measure substrate breakdown across different pH (Supplemental Figure 1A)", the authors should provide a more detailed description of the improvements made to measure AMCase activity. Additionally, it would be helpful to include a thorough explanation of the figure legend for Supplementary Figure 1A to provide clarity to readers.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (16) Figure 1B shows that the authors used the AMCase catalytic domain. It would benefit the authors to explain the rationale behind this choice in the figure legend or the main text.

      This point is addressed in the text:

      “Previous structural studies on AMCase have focused on interactions between inhibitors like methylallosamidin and the catalytic domain of the protein.”

      (17) For Figures 1C-E, it is recommended that the authors include error bars in their results to represent the variability or uncertainty of the data. In Figure 1E, the authors should clarify the units of the Y-axis (e.g., sec-1 µM-1). Additionally, in Figure 1F, the authors should explain how the catalytic acidity is shown.

      We have added error bars and axis labels. Figure 1F is conceptual, so we are leaving it as is.

      (18) The authors stated, "These observations raise the possibility that mAMCase, unlike other AMCase homologs, may have evolved an unusual mechanism to accommodate multiple physiological conditions." It would be helpful for the authors to compare and discuss the pH-dependent AMCase activity of mouse AMCase with other AMCase homologs to support this statement.

      That is an excellent idea for future comparative studies, but beyond the scope of what we are examining in this paper.

      (19) The authors should explain Supplemental Figures 1B and C in the Results or Methods sections to provide context for these figures.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change these sections.

      (20) Supplemental Figure 3 is missing any description. It would be important for the authors to include a mention of this figure in the main text before Supplemental Figure 4 to guide the readers.

      The full legend is in there now and the reference to Supplemental 4 was mislabeled.

      (21) For Supplemental Figure 4, the authors should explain the shape of the symbol used in the figure. Additionally, they should explain "apo" and "holoenzyme" in the context of this figure.

      Unclear what a shape means in this context - perhaps the confusion arises because these are violin plots showing distributions.

      (22) Table 1 requires a more detailed explanation of its contents. Additionally, Tables 2 and 3 need to be included. The authors should include these missing tables in the revised version and explain their contents appropriately.

      Table 1 is the standard crystallographic table - there isn’t much more detailed explanation that can be offered. Tables 2 and 3 were not transferred properly by BioRxiv but were included in the review packet as requested a day after submission.

      (23) In Figure 4, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers. Additionally, it is recommended to use D136, D138, and E140 instead of D1, D2, and E to label the respective parts. The authors should also explain the meaning of the symbol used in the figure.

      Since it is a minor comment, we have elected not to change these figures.

      (24) In Figure 5, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      (25) Similarly, in Figure 6, all panels should be enlarged to enhance the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      Reviewer #2 (Recommendations For The Authors):

      In general, I did not identify many detailed or technical concerns with the work. A few items for the authors to consider are listed below.

      (1) The interpretation of the crystallographic datasets seems complicated by the heterogeneity in the substrate component. It might be nice to see more critical analysis of the approach here. Are there other explanations or possible models that were considered? Do other structures of chitinases or other polysaccharide hydrolases exhibit the same phenomenon?

      We have tried in writing it to provide a very critical approach to this and it is quite likely that other structures contain unmodeled density containing similar heterogeneity (but it is just unmodeled).

      (2) It would be ideal to include more experimental validation of the proposed mechanism. Much of the manuscript includes theoretical validations (pKa estimation, dynamics, etc) - but it would be optimal to make an enzyme variant or do an experiment with a substrate analog.

      Yes - we agree that follow on experiments are needed to fully test the mechanism and that those will be the subject of future work.

      (3) For an uninitiated reviewer, I think the major issue with this study is that the broader significance of the work and how it fits into the context of other work on these enzymes is not clear. It would be helpful to be more specific about what we know of mechanism from work on other enzymes to help the reader understand the motivation for this study.

      We have added w few additional references, guided by reviewer 1 comments, that should help in this respect.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high-resolution cryoEM structures of the WT Kv1.2 voltage-gated potassium channel. Along with this structure, the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore-blocking toxin Dendrotoxin 1. The results show that the mechanism of the block is different from similar toxins, in which a lysine residue penetrates the pore deep enough to empty most external potassium binding sites. 

      The quality of the structural data presented in this manuscript is very high and allows for the unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltagedependent potassium channel gating. Specific comments are appended below. 

      (1) In the mains text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      Now labeled in Fig. 2D

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      Addressed in the Discussion, lines 480-490.

      (3) The structures of WT in the absence of K+ show a narrower selectivity filter, however, Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed at such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We decided to remove mention of carbonyl distances, because at our resolutions the atoms are not resolved.

      (4) It would be really interesting to know the authors' opinions on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We cite Sauer et al. (2011) for the idea that the intact selectivity filter is a strained conformation, and its relaxation yields the wide vestibule seen in NaK2K and Kv channels.  Lines 434-439.

      Reviewer #2 (Public Review): 

      There are four Kv1.2 channel structures reported: the open state, the C-type inactivated state, a dendrotoxin-bound state, and a structure in Na+. 

      A high-resolution crystal structure of the open state for a chimeric Kv1.2 channel was reported in 2007 and there is no new information provided by the cryoEM structure reported in this study. 

      The cryo-EM structure of the C-type inactivated state of the Kv1.2 channel was determined for a channel with the W to F substitution in the pore helix. A cryo-EM structure of the Shaker channel and a crystal structure of a chimeric Kv1.2 channel with an equivalent W to F mutation were reported in 2022. Cryo-EM structures of the C-type inactivated Kv1.3 channel are also available. All these previous structures have provided a relatively consistent structural view of the C-type inactivated state and there is no significant new information that is provided by the structure reported in this study. 

      A structure of the Kv1.2 channel blocked by dendrotoxin is reported. A crystal structure of charybdotoxin and the chimeric Kv1.2 channel was reported in 2013. Density for dendrotoxin could not be clearly resolved due to symmetry issues and so the definitive information from the structure is that dendrotoxin binds, similarly to charybdotoxin, at the mouth of the pore. A potential new finding is that there is a deeper penetration of the blocking Lys residue in dendrotoxin compared to charybdotoxin. It will however be necessary to use approaches to break the symmetry and resolve the electron density for the dendrotoxin molecule to support this claim and to make this structure significant.  

      We have now succeeded in breaking the symmetry and present in Fig. 3 a C1 structure of the toxin-channel complex. In the improved map we now see that our previous conclusion was wrong: the penetration of Lys5 cannot be much deeper than that seen in CTx and ShK structures. However for some reason the pattern of ion-site occupancies in the blocked state is different in this structure than in the others. Fig. 3, Fig. 4E; text lines 559-568.

      The final structure reported is the structure of the Kv1.2 channel in K+ free conditions and with Na+ present. The structure of the KcsA channel by the MacKinnon group in 2001 showed a constricted filter and since then it has been falsely assumed by the K channel community that the lowering of K concentration leads to a construction of the selectivity filter. There have been structural studies on the MthK and the NaK2K channels showing a lack of constriction in the selectivity filter in the absence of K+. These results have been generally ignored and the misconception of filter constriction/collapse in the absence of K+ still persists. The structure of the Kv1.2 channel in Na+ provided a clear example that loss of K+ does not necessarily lead to filter constriction. 

      We are grateful to the reviewer for pointing out this serious omission. We now cite other work including from the Y. Jiang and C. Nichols labs showing examples of outer pore expansion and destabilization. Page p. 4, lines 90-104; lines 421-439.

      The structure in Na+ is significant while the other structures are either merely reproductions of previous reports or are not resolved well enough to make any substantial claims. 

      We now state more clearly the confirmatory nature of our Kv1.2 open structure (lines 71-74) and the similarities of the inactivated-channel structures (lines 193196).

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a large quantity of structural work on the Kv1.2 channel, and the authors should be commended on the breadth of the studies. The structural studies seem well-executed (this is hard to fully evaluate because the current manuscript is missing a data collection and refinement statistics table). The findings are mostly confirmatory, but they do add to the body of work on this and related channels. Notably, the authors present structures of DTXbound Kv1.2 and of Kv1.2 in a low concentration of potassium (with presumably sodium ions bound within the selectivity filter). These two structures add new information, but the studies seem somewhat underdeveloped - they would be strengthened by accompanying functional studies and further structural analyses. Overall, the manuscript is well-written and a nice addition to the field. 

      The data collection and refinement table has been added (Fig. 4 supplement 3.)

      We agree and regret the lack of functional studies. We have not been able to carry them out because work in our laboratory is winding down and the lab soon will be closing.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is not obvious from the data shown how well the side chain positions in the inactivated state are defined by the electron density. These figures should be redone. Maybe the use of stereo would be useful. This will be particularly useful for the reader to decide if the small changes in, for example, the positioning of the carbonyl oxygens are believable. 

      Figure 2 – figure supplement 4 shows the stereo views.

      (2) The authors note the changes observed (though small) in the VSD which were not observed in other structures. The relevance of this observation is not described. Do these changes arise due to the different environments of detergents versus nanodisc etc. in the different structures?

      We’ve now inserted a note about variety of environments and how this might be a cause of the difference: lines 280-285.  

      Are there changes in the pore-VSD interface in the inactivated and the open channel structures and if yes, then do mutations at these residues affect inactivation?

      There is surprisingly little movement at the S4-S5 interface residues identified by Bassetto et al. (2022) as having effects on inactivation. Lines 262-267.

      (3) For the structures in Na+, it is important to provide analytical data showing the biochemical behavior of the channel. This is also true for the wild type and the W to F mutant channel. Size exclusion profiles should be included. 

      The SEC profile (noisy, but showing a clear peak) of the channel in Na+ is now shown in Fig. 4 supplement 1. Low expression of the W366F mutant produced even worse SEC results, but we include a representative micrograph of W366F in Na+ to show the monodispersed protein prep. In Figure 5 – figure supplement 1.

      Reviewer #3 (Recommendations For The Authors): 

      Portions of text from the manuscript are indicated by quotations. 

      Introduction: "One goal of the current study was to examine the structure of the native Kv1.2 channel." 

      Comment, minor points: The authors refer to the Kv1.2 construct used for the structural studies as "native Kv1.2". I found this somewhat confusing because the word "native" suggests derived from a native source. The phrasing above also gives the impression that the structure by Wu et al is the first structure of Kv1.2. The Kv1.2 construct is essentially identical to the one used by Long et al in 2005 to determine the initial structure of Kv1.2 (PDB 2A79). The authors discuss a subsequent paddle-chimera Kv1.2-2.1 structure from 2007 (PDB 2R9R) in the introduction, but it would be prudent to mention the 2005 one of Kv1.2 as well. The open structure determined by Wu et al. is an improvement on the 2A79 structure in that the 2A79 structure was modeled as a poly-alanine model within the voltage sensor domain. Nevertheless, the Kv1.2-2.1 structure (2R9R) is highly similar to the 2A79 structure of Kv1.2. The 2007 structure indicated that Kv1.2-2.1 recapitulates structural features of Kv1.2. It is therefore not surprising that the open structure presented here is highly similar to that of both PDB 2A79 (Kv1.2) and PDB 2R9R (Kv1.2-2.1).  

      We failed to point out the high quality of the original Long et al. 2005 structure and its comparisons with the chimeric structure in Long et al. 2007. We now have tried to correct this: lines 70-74.

      Comment: The cryo-EM analyses suggest that a large percentage (most?) of the particles are missing the beta subunit. This should be commented on somewhere.      

      Now noted on lines 120-132, we pooled particles with and without beta subunits. 

      Regarding ions in the selectivity filter, one-dimensional plots of the density would strengthen the analysis.

      Now included in Fig. 4.

      Also, one should mention caveats associated with identifying ions in cryo-EM maps and the added difficulty/uncertainty when the density is located along a symmetry axis (C4 axis, due to the possible build-up of noise). C1 reconstructions, showing density within the filter, if possible, would strengthen the analyses.

      You are correct. However local resolution is highest in the selectivity filter region. So I think that since the CTF-based filtering is constant over all the structure I think the SNR will be good on axis. 

      Comment: The section on channel inactivation could be simplified by stating that the structure is highly similar to W17'F structures of other Kv channels. (And then discussing possible differences).  

      We now note, “overall conformational difference is identical…” p. 7, lines 193-196.

      "Salt bridges involving the S4 Arg and Lys residues are shifted slightly (Figure 2-figure supplement 3A-D). Arg300 (R3) is in close proximity to Glu226 on the S2 helix for the open channel, while R3 is closer to Glu183 in the S2 helix. The Glu226 side chain adopts a visible interaction with R4 in the inactivated state." 

      Comment: The density for these acidic amino acids seems weak, especially in the inactivated state. It seems like a stretch to make much of their possible conformational changes. 

      We’ve included stereo pairs in Fig. 2 – figure supplement 4.

      "By adding 100 nM α-DTx to detergent solubilized Kv1.2 protein we obtained a cryo-EM structure at 2.8 Å resolution of the complex." 

      Comment: 100 nm. might be lower than the Kv concentration. The current methods are ambiguous on the concentration of Kv channel used for the DTx sample. From the methods, it seems possible that 100 nM DTX is a sub-stoichiometric amount relative to the channel. Regardless, the cryo-EM data seems to suggest that a large percentage of particles do not have DTx bound. This surely complicates the interpretation of density within the filter (which has partly been ascribed to a lysine side chain from DTx).

      The reviewer correctly points a potentially serious problem. It turns out that the 100nM figure we quoted was incorrect, and the actual concentration of toxin, >400 nM, was substantially greater than the protein concentration. This is confirmed by the small fraction (<1%) of 3D class particles that do not show the toxin density (lines 303-306).

      Comment: The methods on atomic structure building/refinement (Protein model building, refinement, and structural analysis) are sparse. A table is needed showing data collection and refinement statistics for each of the structures. This data should also provide average B factors for the ions in the filter. An example can be found in PMID 36224384. 

      Data collection and statistics are now in Fig. 4 – figure supplement 3.

      "In the selectivity filter of the toxin-bound channel (Figure 3E) a continuous density is seen to extend downward from the external site IS0 through to the boundary between IS1 and IS2. This density is well modeled by an extended Lys side chain from the bound toxin, with the terminal amine coordinated by the carbonyls of G27”.

      Comment: While there seems to be extra density in site IS0 from the figures, the density ascribed to lysine in the filter doesn't seem that distinct from those of ions in the open structure. 1-dimensional density plots and some degree of caution may be prudent. Could there, for example, be a mixture of toxin-bound and free channels in the dataset?

      Could the lysine penetrate to different depths? If the toxin binds with nM affinity, why are any channels missing the toxin? Have the authors modeled an atomic structure of the entire toxin bound to the channel to evaluate how plausible the proposed binding of the lysine is? Can the toxin be docked onto Kv1.2 with the deep positioning of the lysine and not clash with the extracellular surface of Kv1.2? 

      We also were concerned about these issues. We have been able to obtain a C1 reconstruction of the toxin-channel complex. In building the atomic model we found that indeed the Lys5 side chain could not penetrate as far as we had thought, and appears to be coordinated by the first carbonyl pair. Fig. 3; text lines 331-332. 

      "Toxin binding shrinks the distances between opposing carbonyl oxygens in the selectivity filter, forming a narrower tunnel into which the Lys side chain fits (Figure 3F). The second and fourth carbonyl oxygen distances are substantially reduced from 4.7 Å and 4.6 Å in an open state to 3.7 Å and 3.9 Å, respectively (Figure 4E). In a superposition of Kv1.2 open-state and α-DTX-bound P-loop structures, there is also an upward shift of the first three carbonyl groups by 0.7~1.0 Å (Figure 4F). " 

      Comment: I suspect the authors intend to refer to Figure 3F rather than 4. I would be cautious here. The refined positions of the carbonyl oxygens are almost certainly affected by the presence or absence of ions in the atomic model during refinement. The density and the resolution of the map may not be able to distinguish small changes to the positions of the carbonyl oxygens (and these differences/uncertainties are compounded by the C4 symmetry). 

      "On the other hand, the terminal amine of lysine in α-DTX is deeply wedged at the second set of carbonyls, narrowing both IS1 and IS2 while displacing ions from the sites (Figure 3-figure supplement 2A). CTX does not cause narrowing of the selectivity filter or displacements of the carbonyls (Figure 3-figure supplement 2B). "

      Comment: Again, caution would be prudent here.  

      We are very grateful to the reviewer for pointing out these problems. We have removed these statements that are weakly supported at our resolution level.

      "Shaker channels are able to conduct Na+ in the absence of K+ (Melishchuk et al., 1998)." 

      Comment: How about the Kv1.2 channel? Is Kv1.2 able to conduct Na+ in the absence of K+ ? This would certainly be relevant for interpreting the conformation of the filter and the density ascribed to Na+ for the structure in sodium.  

      We agree wholeheartedly, but unfortunately we are no longer capable of doing the measurements as our lab will soon close.

      "Ion densities are seen in the IS1, IS3, and IS4 ion binding sites, but the selectivity filter shows a general narrowing as would be expected for binding of sodium ions. The second, third, and fourth carbonyl oxygen distances are reduced from 4.7 Å, 4.7 Å, and 4.6 Å in the open state to 4.4 Å, 3.9 Å, and 4.5 Å, respectively. The rest of the channel structure is very little perturbed. " 

      Comment: The density for IS4 seems weak. To me, it looks like IS1 and IS3 are occupied, whereas IS2 and IS4 are much weaker. 1-dimensional density plots would be helpful. I would suggest caution in commenting too strongly on the "general narrowing" since the resolution of the maps, the local density, and the atomic structure refinement would be consistent with coordinate errors of 0.5 Å or more - and would be compounded (~ doubled) by measuring between symmetry-related atoms.  

      We present 1D plots in Fig. 4E. We no longer comment on “narrowing”

      "Finally, the snake toxin a-Dendrotoxin (DTx) studied here is seen to block Kv1.2 by insertion of a lysine residue into the pore." 

      Comment: Discussion (and references) should be given regarding what was known prior to this study on the mode of inhibition by DTx. 

      Discussion and references now added, lines 287-301.

      "On the other hand, a lengthy molecular-dynamics simulation of deactivation in the Kv1.2-2.1..." 

      Comment: I don't think mentioning this personal communication adds to the manuscript. 

      Actually the original “personal communication” reference was there because the situation is complicated. The movie S3 accompanying the Jensen et al. paper shows deactivation and dewetting of the channel during a 250 us simulation. In the movie there are ions visible in the selectivity filter for the first 50 us, but after that the SF appears empty. Puzzled by this we contacted Dr. Jensen who explained that the movie was in error, ions remain in the SF throughout the entire 250 us. We now cite Jensen (2012) along with the personal communication.

      "The difference between the open and inactivated Kv1.2 structures, like the difference in Kv1.2-2.1 (Reddi et al., 2022) and Shaker (Tan et al., 2022) can be imagined as resulting from a two-step process." 

      Comment: Confusing phrasing because the authors mean to compare their structure to inactivated structures of Kv1.2-2.1 and shaker. 

      Fixed, lines 220-222.

      "Molecular dynamics simulations by Tan et al. based on the Shaker-W17'F structure show that IS3 and IS4 are simultaneously occupied by K+ ions in the inactivated state." 

      Comment: I think that the word "show" is too strong. Perhaps "suggest" 

      The MD result seems to us to be unequivocal, that most of the time the two sites are occupied by ions.

      References are needed for the following statements:  

      -  "as well as the charge-transfer center phenylalanine"

      Now citing Tao et al. 2010, line 156.

      - "total gating charge movement in Shaker channels is larger, about 13 elementary charges per channel" 

      Now citing the review by Islas, 2015 (line 166-169).

      "The selectivity filter of potassium channels consists of an array of four copies of the extended loop (the P-loop) formed by a highly conserved sequence, in this case, TTVGYGD. Two residues anchor the outer half of the selectivity filter and are particularly important in inactivation mechanisms (Figure 2B, right panels). Normally, the tyrosine Y28' (Y377 in Kv1.2) is constrained by hydrogen bonds to residues in the pore helix and helix S6 and is key to the conformation of the selectivity filter. The final aspartate of the P-loop, D30' (D379 in Kv1.2) is normally located near the extracellular surface and has a side chain that also participates in H-bonds with W17' (W366 in Kv1.2) on the pore helix." 

      Citations added (Pless 2013, Sauer 2011) lines 211-214.

      - "During normal conduction, ion binding sites in the selectivity filter are usually occupied by K+ and water molecules in alternation." 

      Added Morais-Cabral et al. 2001, p. 17, lines 463-465.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses, but the evidence is incomplete, as additional biochemical and functional experiments are needed to unambiguously assign MDA5 as a bona fide sensor of triphosphate RNA in this model. This also leaves the title as overstating its case.

      We would like to thank the editorial team for these positive comments on our manuscript and the constructive suggestions to improve our manuscript. According to the suggestions and valuable comments of the referees, we have added substantial amounts of new data and analysis to substantiate our claims, and the manuscript, including the title, has been carefully revised to better reflect our conclusions. We are now happy to send you our revised manuscript, we hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts.

      We concur with the viewpoint that virus-host coevolution complicates the derivation of universal conclusions. To address this challenge, incorporated additional experiments and data based on the suggestions of the reviewers. These experiments were carried out across diverse models, including two distinct vertebrate species (M. miiuy and G. gallus), two different viruses (SCRV and VSV), and the synthesis of corresponding 5’ppp-RNA probes. We believe that these supplementary data bolster the evidence supporting the immune replacement role of MDA5 in the recognition of 5'ppp-RNA in RIG-I deficient species (Figure 1C-1E, Figure 2O and 2P, Figure 4). Moreover, we have duly incorporated references in both the introduction and discussion sections to further support our conclusion that MDA5 in T. belangeri, a mammal lacking RIG-I, possesses the ability to detect RNA viruses posed as RIG-I agonists (doi: 10.1073/pnas.1604939113). Lastly, meticulous revisions have been undertaken in the manuscript, including adjustments to the title, to ensure harmonization with our research outcomes.

      Reviewer#2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in M. miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      The key message of this paper, i.e. MDA5 can sense 5'-triphosphorylated RNA and thereby compensate for the loss of RIG-I, is novel and interesting, yet there is insufficient evidence provided to prove this hypothesis. Most importantly, it is crucial to test the capacity of in vitro synthesized 5'-triphosphorylated RNA to induce an IFN response in MDA5-sufficient and -deficient cells. In addition, a number of important controls are missing, as detailed below.

      To further support the notion that MDA5 is capable of detecting 5'ppp-RNA in species lacking RIG-I, we conducted additional experiments. Initially, we isolated the RNA from SCRV and VSV viruses. Subsequently, we synthesized 5'ppp-RNA probes that corresponded to the genome termini of SCRV and VSV in vitro. Then, these RNAs were treated with Calf intestinal phosphatase (CIAP) to generate dephosphorylated derivatives. Next, we separately tested the activation ability of various RNAs on IRF3 dimer and IFN response in MKC (M. miiuy kidney cell line) and DF-1 (G. gallus fibroblast cell line) cells, and determined that the immune activation ability of SCRV/VSV viruses depends on their triphosphate structure (Figure 1C-1E, Figure 4C and 4J). In addition, the knockdown of MDA5 inhibited the immune response mediated by SCRV RNA (Figure 2P and 2Q). Finally, we incorporated essential experimental controls (Figure 4B and 4I). We think that the inclusion of these supplementary experimental data significantly enhances the credibility and further substantiates our hypothesis.

      The authors describe an interaction between MDA5 and STING which, if true, is very interesting. However, the functional implications of this interaction are not further investigated in the manuscript. Is STING required to relay signaling downstream of MDA5?

      To better explore the role of STING in MDA5 signal transduction, we constructed a STING expression plasmid and synthesized specific siRNA targeting STING. Next, we found that co-expression of STING and MDA5 significantly enhance MDA5-mediated IFN-1 response during SCRV virus infection (Figure 2N). Conversely, silencing of STING expression restored the MDA5-mediated IFN-1 response (Figure 2O). These findings provide important evidence for the critical involvement of STING in the immune signaling cascade mediated by MDA5 in response to 5'ppp-RNA viruses.

      The second part of the paper is quite distinct from the first part. The fact that MDA5 is an interferon-stimulated gene is not mentioned and complicates the analyses (i.e. is there truly more m6A modification of MDA5 on a per molecule basis, or is there simply more total MDA5 and therefore more total m6A modification of MDA5).

      For the experimental data analysis in Figure 5E and 5F, we first compared the m6A-IP group to the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to a value of “1”. Given the observed variability in MDA5 expression levels within the input group of Mock and SCRV virus-infected cells, our analysis represents the actual m6A content of each MDA5 molecule. To enhance clarity, we have updated the label on the Y-axis in Figure 5E and 5F.

      Finally, it should be pointed out that several figures require additional labels, markings, or information in the figure itself or in the accompanying legend to increase the overall clarity of the manuscript. There are frequently details missing from figures that make them difficult to interpret and not self-explanatory. These details are sometimes not even found in the legend, only in the materials and methods section. The manuscript also requires extensive language editing by the editorial team or the authors.

      We acknowledge the valuable feedback from the reviewer and have made significant improvements to our manuscript based on the recommendations provided in the "Recommendation for the authors" section. Furthermore, we have conducted a thorough review of the entire article, resulting in substantial enhancements to the format, clarity, and overall readability of our manuscript.

      Reviewer#3 (Public Review):

      Summary: In this manuscript, the authors investigated the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in a teleost fish called Miiuy croaker. They claimed that MDA5 can replace RIG-I in sensing 5'ppp-RNA of Siniperca cheats rhabdovirus (SCRV) in the absence of RIG-I in Miiuy croaker. The recognition of MDA5 to 5'ppp-RNA was also observed in the chicken (Gallus gallus), a bird species that lacks RIG-I. Additionally, they reported that the function of MDA5 can be impaired through m6A-mediated methylation and degradation of MDA5 mRNA by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker under SCRV infection. This impairment weakens the innate antiviral immunity of fish and promotes the immune evasion of SCRV.

      Strengths:<br /> These findings provide insights into the adaptation and functional diversity of innate antiviral activity in vertebrates.

      Weaknesses:<br /> However, there are some major and minor concerns that need to be further addressed. Addressing these concerns will help the authors improve the quality of their manuscript.One significant issue with the manuscript is that the authors claim to be investigating the role of MDA5 as a substitute for RIG-I in recognizing 5'ppp-RNA, but their study extends beyond this specific scenario. Based on my understanding, it appears that sections 2.2, 2.3, 2.5, 2.6, and 2.7 do not strictly adhere to this particular scenario. Instead, these sections tend to investigate the functional involvement of Miiuy croaker MDA5 in the innate immune response to viral infection. Furthermore, the majority of the data is focused on Miiuy croaker MDA5, with only a limited and insufficient study on chicken MDA5. Consequently, the authors cannot make broad claims that their research represents events in all RIG-I deficient species, considering the limited scope of the species studied.

      We agree with the reviewer's perspective that functional analysis of MDA5 in M. miiuy may not adequately represent all species lacking RIG-I. To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus), two distinct viruses (SCRV and VSV), and the synthesis of two corresponding 5’ppp-RNA probes. While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, although we cannot definitively establish the immune surrogate function of MDA5 in all RIG-I-deficient species, our research data further substantiates this hypothesis. Moreover, we have adopted a more cautious attitude in summarizing our experimental conclusions, thereby enhancing the rigor of our manuscript language.

      The current title of the article does not align well with its actual content. It is recommended that the focus of the research be redirected to the recognition function and molecular mechanism of MDA5 in the absence of RIG-I concerning 5'ppp-RNA. This can be achieved through bolstering experimental analysis in the fields of biochemistry and molecular biology, as well as enhancing theoretical research on the molecular evolution of MDA5. It is advisable to decrease or eliminate content related to m6A modification.

      Following the reviewer's recommendations, we have revised the title to emphasize that our main research focus is a teleost fish devoid of RIG-I. Furthermore, we have conducted additional molecular experiments to further elucidate the 5'ppp-RNA recognition function of MDA5 in RIG-I-deficient species. In an attempt to analyze the potential molecular evolution of MDA5 resulting from RIG-I deficiency, we collected MDA5 coding sequences from diverse vertebrates. However, due to multiple independent loss events of RIG-I in fish, fish with or without RIG-I genes in the phylogenetic tree cannot be effectively clustered separately, making it extremely difficult to perform this aspect of analysis. Consequently, we have regrettably opted to forgo the molecular evolution analysis of MDA5.

      Our article topic is to reveal an antagonistic phenomenon between fish receptor and RNA viruses. The MDA5 of RIG-I-lost fish has evolved the ability to recognize 5’ppp-RNA virus and mediate IFN response to resist SCRV infection. Conversely, the m6A methylation mechanism endows the SCRV virus with a means to weaken the immune capacity of MDA5. Therefore, we believe that the latter part is an important part of the arms race between the virus and its host, and should be retained.

      Additionally, the main body of the writing contains several aspects that lack rigor and tend to exaggerate, necessitating significant improvement.

      We appreciate the reviewer’s comment and have improved the manuscript addressing the points raised in the “Recommendation for the authors”. We have added corresponding experiments to strengthen the verification of the conclusions, and in addition, we are more cautious in summarizing the language of the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The evidential foundation within the Result 1 section appears somewhat tenuous.

      Firstly, the author derives conclusions regarding the phenomenon of RIG-I loss in lower vertebrates by referencing external literature and conducting bioinformatics analyses. It is pertinent to inquire whether the author considered fortifying these findings through additional WB/PCR experiments, particularly for evaluating RIG-I expression levels across diverse vertebrates, encompassing both lower and higher orders.

      Firstly, the species we analyzed are mostly model species with excellent genomic sequence information in the database. Secondly, the RIG-I protein sequences (at least some domain sequences) are relatively conserved in vertebrates. Therefore, the credibility of evaluating the existence of RIG-I in these species through homology comparison is high. Therefore, we do not intend to conduct additional PCR/WB experiments to confirm this.

      Additionally, following the identification of RIG-I loss, the author postulates MDA5 as a substitute of RIG-I, grounding this speculation in the analysis of MDA5 and LGP2 protein structures. It is imperative to address whether the author could enhance the manuscript by supplying expression data for MDA5 and LGP2 across different vertebrates and elucidating further why MDA5 is posited as the compensatory mechanism for RIGI loss.

      Like MDA5, LGP2 is also an interferon-stimulating gene, so they both likely exhibit high sensitivity to viral infections. Therefore, we think that comparing the expression data of these two genes is difficult to evaluate their function. In mammals, the regulatory mechanisms of LGP2 to RIG-I and MDA5 were complicated and ambiguous. To evaluate the potential function of LGP2 in M. miiuy, we further constructed LGP2 plasmid and synthesized siRNA targeting LGP2. Then, our results indicate that mmiLGP2 can enhance the antiviral immune response mediated by mmiMDA5 (Figure 1H and 1I), further indicating the regulatory role of mmiLGP2 in RLR signaling, rather than acting as a compensatory receptor for RIG-I.

      Also, is it conceivable that other receptors contribute to this compensatory effect in lower vertebrates?

      5’ triphosphate short blunt-end double-strand RNA is the ligand of RIG-I as contained in the panhandle of negative-strand viral genomes. We mainly focus on the immune recognition and compensatory effects of other receptors on RIG-I loss, and MDA5, as the protein with the most similar structure, first attracted our attention. In addition, IFIT proteins have been reported to recognize triphosphate single-stranded RNA (doi: 10.1038/nature11783). However, we used SCRV and VSV RNA as viral models, both of which have negative stranded genomes and meet the ligand standards of RIG-I, rather than IFIT. Therefore, we excluded the IFIT protein from our research scope.

      (2) The article exclusively employs a singular type of 5'PPP-RNA virus and one specific lower vertebrate species, thereby potentially compromising the robustness of the assertion that this phenomenon is prevalent in lower vertebrates. To bolster this claim, could the author consider incorporating data from an alternative 5'PPP-RNA virus and a different lower vertebrate species?

      To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus) and two distinct viruses (SCRV and VSV). While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, these experimental results further confirmed the conservatism of this immune compensation mechanism.

      (3) A nuanced consideration of the statement in Result 5 is warranted. Examination of the results under SCRV infection conditions suggests dynamic fluctuations in MDA5 expression levels, challenging the veracity of the statement implying "increased expression", which contradicts the proposed working model of this article.

      Because MDA5 acts as a receptor and plays a recognition immune role in the early stages of virus infection, the expression of MDA5 in the early stage of SCRV infection rapidly increases. In the later stage of infection, the expression of MDA5 may gradually decrease again due to the negative feedback mechanism in the host body to prevent excessive inflammation. However, compared to the uninfected group, the expression of MDA5 was significantly increased in the SCRV-infected group, so we believe that the term "increased expression" is not a problem. In addition, the m6A mechanism can weaken the function of MDA5, but it still cannot prevent the overall increase of MDA5 expression, which is not contradictory to the working model in this article.

      Additionally, the alterations in m6A levels in miiuy croaker under SCRV infection conditions warrant clarification. Could the author employ m6A dot blotting to supplement the findings related to total m6A levels?

      Our previous studies (doi: 10.4049/jimmunol.2200618) have suggested that the total m6A level is increased after SCRV infection in miiuy croaker. We cited this conclusion in the discussion of our manuscript.

      (4) It would be beneficial if the editors could assist the author in enhancing the language of the manuscript.

      We have carefully checked the full article and modified it with Grammarly tools, and we believe that the grammar, format, and readability of our articles have been greatly improved.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1

      (1) Figure 1B - some clarification needs to be added about this figure in the text. It is unclear what the main point is that the authors would like to convey.

      What we want to emphasize is that some species with RIG-I, such as zebrafish, have also experienced RIG-I loss events, but have undergone whole genome replication events before the loss, thus preserving a copy of RIG-I. This indicates that loss events of RIG-I are very common in vertebrates and do not occur randomly. We have elaborated on this point in the results and discussion.

      (2) Figure 1C - is not very informative other than showing Mm MDA5 and LGP2 side-by-side. It would be more useful to show a comparison of human RIG-I/MDA5 alongside Mm and Gg MDA5. Are there any conserved/shared key residues between hRIG-I/hMDA5 versus mmMDA5?

      Homologous proteins are often known to adopt the same or similar structure and function. We have added human RIG-I domain information to this figure (Figure 1F). By comparing the domain information of human RIG-I with M. miiuy MDA5 and LGP2, M. miiuy MDA5 has a similar structure to human RIG-I, making it most likely to compensate for the missing RIG-I. While M. miiuy LGP2 lacks the CARD domain, which is crucial for signal transduction, so we will shift our focus to M. miiuy MDA5. In addition, we collected protein sequences of MDA5 and RIG-I from various vertebrates to identify key residues evolved in recognizing 5'ppp-RNA by M. miiuy MDA5. However, unfortunately, no potential residues were found during the comparison process.

      Figure 2

      (1) Figure 2B - It would be important to demonstrate MDA5-Flag expression by immunoblot and compare MDA5-Flag overexpression to endogenous MDA5 expression using the anti-MDA5 antibody from panel 2A. If IF is used, more cells need to be visible in the field.

      After transfecting the MDA5 plasmid into MKC, endogenous MDA5 expression was detected using MDA5 antibodies. The results showed a significant increase in MDA5 protein levels, indicating that MDA5 antibodies can specifically recognize MDA5 protein. In addition, we retained the original immunofluorescence images to better demonstrate the subcellular localization of MDA5.

      (2) Figure 2C - The 1:1 stoichiometry of MDA5:MAVS (in the absence of any stimulus) is quite surprising. How does the interaction between MDA5 and MAVS change upon stimulation with an RNA ligand (SCRV, poly(I:C))?

      We do not believe that the actual stoichiometry between MDA5 and MAVS is what you described as 1:1. In fact, the proportion of proteins in the complex depends on many factors in the experimental results with Co-IP. Firstly, the MDA5 plasmid in this study has a 3 × Flag tag, while the MAVS only has a 1x Myc tag, which makes the antibody more sensitive for detecting MDA5-Flag. In addition, the Co-IP results are also affected by multiple factors such as the type of antibody and the number of recoveries, making it difficult to estimate the actual ratio of MDA5 to MAVS. Based on the above reasons and the fact that the detection of the interaction strength between MDA5 and MAVS after infection seems to be off-topic, we did not continue to explore this point.

      (3) Figure 2D - The interaction between MDA5 and STING is a very interesting finding but is not elaborated on in the paper (even though the interaction between MDA5 and STING is mentioned in the abstract). The manuscript would be strengthened if the interaction between MDA5 and STING is further investigated. For example, does the IFN response that is reported in panels 2E to 2H require the presence of STING? Does mmMDA5 signal via STING in response to a DNA ligand?

      We appreciate the referee's suggestion to study the mutual influence between MDA5 and STING. We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. We understand the importance of cGAS/STING pathways in identifying exogenous DNA, so exploring the MDA5 pathway for DNA ligand recognition is an interesting and meaningful perspective. But this seems to be detached from the theme of our article, so we didn't continue to explore this point.

      (4) Figures 2F and 2H - the authors demonstrate that SCRV induces a type I IFN response in an MDA5-dependent manner. While SCRV is a single-stranded negative-sense RNA virus that contains 5'ppp-RNA, it cannot be excluded that MDA5 is activated here in response to a double-stranded RNA intermediate of viral origin or even a host-derived RNA whose expression or modification is altered during infection. To demonstrate in an unambiguous manner that MDA5 senses 5'ppp-RNA, it is crucial to use the in vitro synthesized 5'ppp-RNA (and its dephosphorylated derivative as a control) from Fig. 4 in these experiments.

      We transfected 5 'ppp SCRV and 5' ppp VSV (and their dephosphorylated derivatives) synthesized in vitro into MKC cells and DF-1 cells, respectively. The results showed that 5’ppp-RNAs significantly promoted the formation of IRF3 dimers, while their dephosphorylated derivatives did not (Figure 4C and 4J). In addition, we extracted virus RNA from the SCRV and VSV viruses and dephosphorylated them with Calf intestinal phosphatase (CIAP). These RNAs were transfected into MKC and DF-1 cells and found that the immune response mediated by virus RNAs was much higher than the dephosphorylated form (Figure 1C-1E). The above results indicate that the immune response activated by SCRV and VSV is indeed dependent on their triphosphate structure. Finally, the IRF3 dimer and IFN induction activated by SCRV RNA can be inhibited by si-MDA5 (Figure 2P and 2Q), further demonstrating the involvement of MDA5 in the immune response mediated by 5’ppp-RNA ligands.

      (5) In mice and humans, MDA5 is known to collaborate with LGP2 to jointly induce an IFN response. Does M.miiuy express LGP2? If so, it would be informative to include a siRNA targeting LGP2 in the experiments in panel F. In mammals, LGP2 potentiates the response via MDA5 while it may inhibit RIG-I activation.

      M.miiuy express LGP2. We constructed an LGP2 plasmid and synthesized si-LGP2 to investigate the impact of LGP2 on MDA5-mediated immune processes (Figure 1G-1I). The results showed that LGP2 can enhance the IFN response mediated by MDA5 during SCRV virus infection, similar to that in mammals.

      (6) Minor comment - Is the poly(I:C) used in this figure high or low molecular weight poly(I:C)? HMW poly(I:C) preferentially stimulates MDA5, while LMW poly(I:C) preferentially stimulates RIG-I.

      We used poly(I:C)-HMW as a positive control for activating MDA5. We have modified the relevant information in Figure 2 and its legend.

      Figure 3

      (1) Figure 3F/G - The normalization in this Figure is difficult to interpret. It would be better to split Figure 3G into 4 separate graphs and include the mock-infected cells alongside the infected samples (as done in Figure 2).

      To better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain of MDA5 has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus by the RD domain.

      Figure 4

      (1) Figure 4B - A number of important controls are missing. Was the immunoprecipitation of RNA successful? This could be shown by running a fraction of the immunoprecipitated material on an RNA gel and/or by showing that the input RNA was depleted after IP. In addition, a control IP (Streptavidin beads without biotinylated RNA) is missing to ensure that MDA5 does not stick non-specifically to the Streptavidin resin.

      We appreciate the referee's suggestions. We rerun this experiment and added a non-biomarker RNA IP control group, and the results showed that MDA5 did not adsorb non-specific onto the beads (Figure 4B). In addition, based on the referee's suggestion, we tested the consumption of RNA before and after immunoprecipitation, and the results showed that biotin-labeled RNA, rather than non-biotin-labeled RNA, could be adsorbed by beads, indicating the success of RNA precipitation. However, we think that this is not necessary for the final presentation of the experimental results, so we did not show this in the figure.

      (2) Figure 4B - It is unclear why there is such a large molecular weight difference between endogenous MDA5 and MDA5-Flag (110 kDa versus 130/140 kDa). Why is there less MDA5-Flag retrieved than endogenous MDA5?

      After careful analysis, we believe that the significant difference in molecular weight between endogenous MDA5 and MDA5 Flag may be due to three reasons. Firstly, MDA5 flag has a 3× Flag tag. Secondly, as shown in the primer table, we constructed MDA5 between the NotI and XbaI cleavage sites in the pcDNA3.1 vector, which are located at the posterior position in the vector. This means that the Flag tag has a certain distance from the starting codon of MDA5, and these sequences on the vector can also be translated and increase the molecular weight of the exogenous MDA5 protein. Finally, in order to facilitate the amplification of the primers, the F-terminal primers of MDA5 contain a small portion of the 3'UTR sequence (excluding the stop codon). These above reasons may have led to significant differences in molecular weight. In addition, in order to supplement important experimental controls, we have conducted a new RNA pull-down experiment as shown in Figure 4B.

      (3) Minor point: Figure 4B - please clarify in the figure whether RNA or protein is immunoprecipitated and via which tags.

      We have conducted a new RNA pull-down experiment as shown in Fig 4B, and we have clearly labeled the relevant information in the figure.

      (4) Figure 4E - the fraction of MDA5 that binds 5'ppp-RNA seems incredibly minor. And why is this experiment done using 5'OH-RNA as a competitor, rather than simply incubating MDA5 and 5'OH-RNA together and demonstrating that these do not form a complex?

      The proportion of MDA5 combined with 5’ppp-RNA is influenced by many conditions, including the concentration and purity of the probe and purified protein. In addition, the dosage ratio between the RNA probe and MDA5 protein in the EMSA experiment can also have a significant impact on the results. Therefore, it is not possible to accurately determine the actual binding force between MDA5 and RNA. In the EMSA experimental program, both cold probes (5’ppp-RNA) and mutated cold probes (5’OH-RNA and 5’pppGG-RNA) are crucial for demonstrating the specific binding between MDA5 and 5’ppp-RNA, as they can exclude false positive errors caused by factors such as the presence of biotin in the purified MDA5 protein itself.

      (5) Figure 4B/4C/4F - These experiments would be strengthened by including an MDA5 mutant that cannot bind to RNA. These mutants are well-described in mammals. If these residues are conserved, it is straightforward to generate this mutant.

      As shown in Figure 3, the MDA5 of M. miiuy has an RD domain that can recognize the SCRV virus. We constructed MDA5-△RD mutant plasmids with 6x His-tags and purified them for EMSA experiments (Figure 4E). The experimental results further indicate that MDA5, rather than MDA5-△RD, can bind to 5’ppp-SCRV (Figure 4G). This further confirms the crucial role of the RD domain in recognizing the 5'ppp-RNA virus.

      (6) Minor point: Figure 4E: please clarify in which lanes MDA5 has been added.

      Thank you for the referee's suggestion. We have synthesized new 5'ppp-RNA probes (5’ppp-SCRV and their dephosphate derivatives) and rerun this experiment, and relevant information has been added in the Figure (Figure 4F).

      Figure 5

      (1) Figure 5C - As MDA5 is an interferon-stimulated gene (as shown in panel G/H/I)) the increased MDA5 expression could simply explain the increase in the amount of m6A-MDA5 that is immunoprecipitated after infection. Could this figure be improved by doing a fold change between input vs m6A-IP OR uninfected vs SCRV-infected conditions? This would reveal whether the modification of MDA5 with m6A is really increased after infection.

      As shown in Figure 5F below, our data indicates that the proportion of m6A-modified MDA5 does indeed increase after SCRV infection, rather than solely due to the increased expression of MDA5 itself.

      (2) Figure. 5E/F - The y-axis is unclear: relative MDA5 m6A levels. Relative to what? Input? Mock infected?

      For experiments in Figure 5E/F, we first compared the m6A-IP group with the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to “1”. We have replaced the Y-axis name with a clearer one (Figure 5E and 5F).

      (3) General comment - It is not mentioned in the text that MDA5 is an interferon-stimulated gene. This would account for the increase in expression (qPCR) after viral infection or poly(I:C) transfection, hence there is no novelty in this finding. In addition, the authors suggest that MDA5 increases at the protein level (by immunoblot) but the increase on these blots is not convincing (figure 5H/5I).

      We understand that the increase in expression of MDA5 as an interferon-stimulated gene after viral infection is a common phenomenon. We present this to further validate the m6A sequencing transcriptome data, and to demonstrate that although m6A modification interferes with MDA5 expression during viral infection, it cannot prevent the increase of mRNA level of MDA5. In addition, we rerun the experiment and the results showed that the expression of MDA5 protein can indeed be specifically activated by the SCRV virus and poly(I:C)-HMW.

      Figure 6

      (1) Figure 6E - What was the MOI of the virus used in this experiment? It is not mentioned in the figure legend.

      MOI=5, we have added this point in the figure legend.

      Figure 7

      (1) Figure 7J - This graphic is somewhat misleading and should be altered to better reflect the conclusions that are drawn in the manuscript. The graphic suggests that MAVS and STING interact, but this is not demonstrated in the paper. In addition, the paper does not demonstrate whether MAVS or STING (or both) are needed downstream of MDA5 to relay signalling. Finally, please draw an arrow from type I IFNs to increased expression of MDA5 to illustrate that MDA5 is an ISG.

      Thank you for the referee's suggestion. We have revised the images to more accurately match the conclusions of the manuscript (Figure 7J). Firstly, we have separated the STING protein from the MAVS protein. Secondly, arrows have been used to indicate that MDA5 is an IFN-stimulated gene. Finally, as we have added relevant experiments to demonstrate the importance of MITA protein in the signaling process of MDA5-activated IFN response. In addition, the function of MAVS binding to MDA5 protein and promoting its signal transduction is very conserved, and there is a good research background even in fish with RIG-I deficiency (10.1016/j.dci.2021.104235). Therefore, in Figure 7J, we still chose to bind MAVS to MDA5 protein and use it as a downstream signal transducer of MDA5.

      Discussion<br /> (1) There is very little discussion about METTL and YTHDF proteins in the discussion despite the fact that the last 2 figures are entirely devoted to these proteins.

      Based on the referee's suggestion, we have added relevant content about METTL and YTHDF proteins in the discussion. In addition, the basic mechanism and function of METTL and YTHDF proteins were briefly described in the introduction.

      Reviewer #3 (Recommendations For The Authors):

      Please refer to the specific suggestions and recommendations. They include proposals for experimental additions, improved methodologies, and suggestions to resolve writing-related concerns.

      Major concerns

      (1) I suggest changing the article title to "Functional Replacement of RIG-I with MDA5 in Fish Miiuy Croaker", or a similar title, to make it more focused and closely aligned with the content of the article.

      Following the reviewer's recommendations, we have revised the title to emphasize our primary research subject is a teleost fish that lacks RIG-I. In addition, we have changed “5’ppp-RNA” to “5’ppp-RNA virus” to emphasize the interaction between the virus and the receptor. We believe that the revised title is more in line with the content of the article.

      (2) Due to the inherent limitations in genome sequencing, assembly, and annotation for the Miiuy croaker, comprehensive annotation of immune-related genes remains incomplete. To address this critical gap, it is recommended that authors establish experimental protocols, such as Fluorescence In Situ Hybridization (FISH), to confirm the absence of RIG-I in the Miiuy croaker. They should simultaneously employ MDA5 probes as a positive control for validation purposes.

      The miiuy croaker has good genomic information at the chromosomal level (doi: 10.1016/j.aaf.2021.06.001). In addition, studies have shown that RIG-I is absent in the orders of Perciformes (doi: 10.1016/j.fsirep.2021.100012), while miiuy croaker belongs to the order Perciformes, so it does indeed lose the RIG-I gene. Therefore, we do not intend to use FISH technology to prove this.

      (3) Similarly, it is recommended that the authors first provide evidence of the presence of 5'ppp at the 5' terminus of the genome RNA of SCRV, as demonstrated in the study by Goubau et al. (doi: 10.1038/nature13590, Supplementary figure 1). This evidence is crucial before drawing conclusions about the compensatory role of MDA5 in recognizing 5'ppp RNA viruses, using SCRV as the viral model.

      As suggested by the referee, we extracted SCRV RNA from SCRV virus particles and assessed the 5’-phosphate-dependence of stimulation by SCRV RNA. Calf intestinal phosphatase (CIAP) treatment substantially reduced the stimulatory activity of SCRV RNA in MKC cells of M. miiuy (Figure 1C and 1E). In addition, similar results were obtained by transfecting VSV-RNA isolated from VSV virus into DF-1 cells of G. gallus (Figure 1D). The above evidences confirm the presence of triphosphate molecular features between SCRV and VSV viruses, and indicating that birds and fish lacking RIG-I have other receptors that can recognize 5’ppp-RNA.

      (4) The 62-nucleotide (nt) 5'ppp-RNA utilized in this study was obtained from Vesicular Stomatitis Virus (VSV). In order to provide direct evidence, it is necessary to include a 62-nt 5'ppp-RNA that is directly derived from SCRV itself.

      We adopted this suggestion and synthesized a 67-nucleotide 5’ppp-SCRV RNA probe. We found that 5’ppp-SCRV activates dimerization of IRF3 and binds to MDA5 of M. miiuy in a 5’-triphosphate-dependent manner (Figure 4A-4F).

      (5) Given that RNAs with uncapped diphosphate (PP) groups at the 5′ end also activate RIG-I, similar to RNAs with 5′-PPP moieties, and the 5′-terminal nucleotide must remain unmethylated at its 2′-O position to allow RNA recognition by RIG-I, it is necessary for the authors to conduct additional experiments to supplement and validate these two distinguishing features of RIG-I in RNA recognition. This will provide more reliable evidence for the replacement of RIG-I by MDA5 in RNA recognition.

      Thank you for the reviewer's professional suggestions. We understand that exploring the combination of 5’pp-RNA and 2′-O-methylated RNA with MDA5 can further demonstrate the alternative function of MDA5. But we think that the use of 5’ppp-RNA and their dephosphorylation derivatives can fully demonstrate that the MDA5 of M. miiuy and G. gallus have evolved to recognize 5’triphosphate structure like human RIG-I. Therefore, we do not intend to conduct any additional experiments

      (6) In section 2.3, the authors assert that Miiuy croaker recognizes SCRV through its RD domain. This claim is supported by their data showing that cells overexpressed with the MDA5 ΔRD mutant lost the ability to inhibit SCRV replication. As a result, the authors draw the conclusion that "these findings provide evidence that MDA5 may recognize 5'-triphosphate-dependent RNA (5'ppp-RNA) through its RD domain." However, to strengthen their argument, the authors should first demonstrate that during SCRV infection, MDA5-mediated antiviral immune response is indeed initiated by recognizing the 5'ppp part of the SCRV RNA, rather than the double-strand part (which can exist in ssRNA virus) of the viral RNA, as this is naturally a ligand for MDA5. Additionally, the authors should treat the isolated SCRV RNA with CIP to remove the phosphate group and examine the binding of MDA5 with SCRV RNA before and after treatment. They should also transfect CIP-treated or untreated SCRV RNA into MDA5 knockdown and wild-type MKC cells to investigate the induction of antiviral signaling and levels of viral replication. Finally, the authors should verify the binding ability of the mutants with isolated SCRV RNA, with or without CIP treatment, to determine which domain of MDA5 is responsible for SCRV 5'ppp-RNA recognition.

      We understand the reviewer's concern that MDA5 may be identified by binding to dsRNA in the SCRV virus. Based on the reviewer's suggestion, we extracted SCRV RNA and obtained its dephosphorylated RNA using Calf intestinal phosphatase (CIAP). Next, we transfected them into MDA5-knockdown and wild-type MKC cells, and detected the dimerization of IRF3 and IFN reaction. The results indicate that SCRV RNA does indeed activate immunity in a triphosphate-dependent manner, and knockdown of MDA5 prevents immune activation of SCRV RNA (Figure 1C and 1E, Figure 2P and 2Q). Finally, we synthesized a 5'ppp-SCRV RNA probe and demonstrated that MDA5 binds to 5'ppp-SCRV through the RD domain (Figure 4E-4G). We believe that these results can better demonstrate that MDA5 recognizes 5’ppp-RNA through its RD domain and addresses the concerns of the reviewers.

      (7) Similarly, merely presenting Co-IP data demonstrating the interaction between Miiuy croaker MDA5 and STING in overexpressed EPC cells does not justify the claim that "in vertebrates lacking RIG-I, MDA5 can utilize STING to facilitate signal transduction in the antiviral response". This is because interactions observed through overexpression may not accurately reflect the events occurring during viral infection or their actual antiviral functions. To provide more robust evidence, it is essential to conduct functional experiments after STING knockout (or at least knockdown). Furthermore, it is important to note that Miiuy Croaker alone cannot adequately represent all "vertebrates lacking RIG-I".

      We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 response (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. In addition, loss of RIG-I is a common phenomenon in vertebrates, and STING of birds such as chickens (doi: 10.4049/jimmunol.1500638) and mammalian tree shrews (doi: 10.1073/pnas.1604939113) can also bind to MDA5, indicating that STING can indeed play a crucial role in MDA5 signaling in species with RIG-I deficiency. We have added this section to our discussion and elaborated on our observations in more cautious language.

      (8) In the manuscript, a series of experiments were conducted using an antibody (Beyotime Cat# AF7164) against endogenous MDA5. The corresponding immunogen for this MDA5 antibody is a recombinant fusion protein containing amino acids 1-205 of human IFIH1/MDA5 (NP_071451.2). However, the amino acid sequences of IFIH1/MDA5 differ substantially between humans and Miiuy croaker, which could introduce errors in the results. Therefore, it is essential to employ antibodies specifically designed for targeting Miiuy croaker's own MDA5 in the experiments.

      As shown in Figure 2B, endogenous MDA5 antibodies can detect the MDA5 portion that is forcibly overexpressed by plasmids, suggesting that the MDA5 antibody can indeed specifically recognize the MDA5 protein of M. miiuy.

      (9) It is recommended to investigate the phosphorylation of IRF3 in order to confirm the downstream signaling pathway during viral infection when MDA5 is knocked down or overexpressed.

      Due to the lack of available phosphorylation antibodies for fish IRF3, we used IRF3 dimer experiments to detect downstream signaling (Figure 1C and 1D, Figure 2P, Figure 4C and 4J).

      (10) The use of poly I:C as a mimic for dsRNA to investigate MDA5's recognition of 5'ppp-RNA in hosts lacking RIG-I, as well as the examination of the regulatory role of MDA5 m6A methylation upon activation by 5'ppp-RNA, may be inappropriate. Poly I:C does not possess 5'ppp, and while it has been identified as a ligand for MDA5 in various studies, MDA5 cannot serve as a substitute for RIG-I in recognizing poly (I:C). Therefore, the authors should utilize 5'ppp-dsRNA as the mimic and include the corresponding 5'ppp-dsRNA control without a 5'triphosphate as the negative control (both available from InvivoGen). This approach will specifically elucidate the mechanisms involved when MDA5 functions similarly to RIG-I in the recognition of 5'ppp-RNA.

      In our study, we used poly(I:C)-HMW, a known dsRNA mimetic that can be preferentially recognized by MDA5 rather than RIG-I, as a positive control for activating MDA5. What we want to demonstrate is that, like poly(I:C)-HMW (positive control), SCRV can also promote MDA5-mediated IFN immunity, further indicating the important role of MDA5 in 5’ppp-RNA virus invasion. We have clearly labeled the type of poly(I:C) in the figures and legends to avoid misunderstandings for readers.

      (11) In Figure 2, Figure 3, and Figure 6, the appearance of virus plaques is not readily apparent, and it is necessary to replace these images with clearer photographs. It appears that MKC or MPC cells are not appropriate for conducting plaque assays. To accurately assess viral proliferation, the authors should measure key indicators throughout the process, such as the production of positive-strand RNAs (+RNAs), replication intermediates (RF), and transcription of subgenomic RNAs. This approach is preferable to solely measuring the M and G protein genes from the virus genome as positive results can still be observed in contaminated cells.

      As pointed out by the reviewer, we also think that the virus plaque images in Figure 2K and Figure 3D are not clear enough, so we have replaced them with new clear images (Figure 2J and Figure 3D). But we think that other images can clearly display the proliferation of the SCRV virus, so we did not replace them. In addition, the primers we currently use do measure +RNA, so the replication level of the SCRV virus can be accurately evaluated without being affected by virus contamination. Because the regions where the two pairs of primers are located belong to the SCRV-M and SCRV-G protein genes, we label them as SCRV-M and SCRV-G to distinguish between the two pairs of genes. To avoid reader misunderstanding, we have modified the Y-axis label in the figures (Figure 2I and 2K, Figure 3E, Figure 6E and 6O).

      (12) There is a substantial disparity in the molecular size of M. miiuy MDA5 between endogenous and exogenously expressed proteins, as shown in Figure 2A and 2C-D. Please provide clarification.

      Please refer to the response to Reviewer 2's question regarding Figure 4B above.

      (13) The manuscript incorporates the evolutionary perspective, but lacks specific evolutionary analysis. Thus, it is essential to include relevant analysis to comprehend the evolutionary dynamics and positive selection on MDA5 and LGP2 in the absence of RIG-I in Miiuy croaker. This can be achieved through theoretical calculations using appropriate algorithms, such as the branch models and branch-site models based on the maximum-likelihood method implemented in the phylogenetic analysis by maximum likelihood (PAML) package.

      In fact, we have analyzed the molecular evolution of MDA5 and LGP2. Unfortunately, even when analyzing only the MDA5/LGP2 CDS sequences in fish, we found that the topologies of gene trees of MDA5/LGP2 were largely consistent with the species tree. Thus, species with or without RIG-I in the gene trees cannot effectively separate clusters, making it extremely difficult to analyze the molecular evolution of MDA5/LGP2 caused by RIG-I deficiency. Consequently, we gave up this aspect of analysis.

      (14) If the narrative regarding m6A methylation goes beyond the activation of MDA5 through recognition of 5'ppp-RNA and represents a regulatory mechanism for all MDA5 activation events, it is not relevant to the theme of "An arms race under RIG-I loss: 5'ppp-RNA and its alternative recognition receptor MDA5." Therefore, all investigations in this paper should focus solely on events when MDA5 recognizes 5'ppp-RNA. Any data associated with the broader regulatory mechanisms and m6A methylation of MDA5 should be excluded from this manuscript and instead be included in a separate study dedicated to exploring this specific topic.

      Our theme aims to showcase RNA viruses, rather than an interaction between 5'ppp-RNA and host virus receptors, which our current topic cannot accurately express. Therefore, we made two main changes: firstly, we limited the study species to M. miiuy, although some studies on the functional substitution of MDA5 for RIG-I involved birds. Secondly, change “5’ppp-RNA” to “5’ppp-RNA virus”. We believe that the revised title is more in line with our current research contents.

      (15) The running title appears to be hastily done.

      We modified it to “MDA5 recognizes 5’ppp-RNA virus in species lacking RIG-I”.

      (16) There are many descriptions that are not strongly related to the main theme of the article in the introduction section, making it lengthy and fragmented. Please focus on the research background of RIG-I and MDA5, including their structures, functions, and regulatory mechanisms, as well as the research progress on the compensatory effect of MDA5 in the absence of RIG-I and its evolutionary adaptation mechanism in other species.

      Based on the suggestions of the reviewers, we have removed some of the less relevant content in the introduction and added research progress on the compensatory effect of MDA5 in the evolutionary adaptation mechanism of tree shrews in the absence of RIG-I.

      (17) Lines 149-156 in the "Results" section include content that resembles an "Introduction" It is important to avoid duplicating information in the results section. Therefore, the authors are encouraged to revise this paragraph to ensure conciseness in the article.

      We have streamlined this section to enhance the article's conciseness and clarity.

      (18) In the "Results" section, at line 177, the authors assert, "As depicted in Figure 1F-1H," which should be corrected to Figure 2F-2H. Furthermore, the y-axis of the two figures on the right-hand side of Figure 2H represents the ISG15 genes. At line 182, "as demonstrated in Figure 1I-1L," should be revised as "as illustrated in Figure 2I-2L". The authors demonstrated a lack of attention to detail.

      Thank you to the reviewer for pointing out our errors, and we have made the necessary corrections.

      (19) In lines 197-198, the authors stated that "MDA5-ΔRD showed an inability to interact with SCRV." However, Figure 3D did not reveal any significant difference, thus it is advisable to repeat this experiment at least once.

      We have replaced this virus spot image with a new one (Figure 3D).

      (20) In lines 200-201 of the "2.3 RD domain is required for MDA5 to recognize SCRV" section, the authors report that the expression of antiviral genes was induced by the overexpression of both MDA5 and MDA5-ΔRD, even in the absence of infection (Figure 3F). Why does the expression of antiviral genes increase in the absence of viral RNA stimulation? Please provide a reasonable explanation.

      In the absence of viral infection, overexpression of viral receptor proteins may still transmit erroneous signaling, affecting the body's immunity. We speculate that due to the preservation of the CARD domain by MDA5 and MDA5-ΔRD, they can still induce the expression of antiviral factors without ligands, although this induction effect is much smaller than that of viral infection. However, in order to better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in the figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus invasion by the RD domain of MDA5.

      (21) Please provide the GeneBank accession number of M. miiuy MDA5.

      The GeneBank accession number of M. miiuy MDA5 was added in the section 4.5 plasmids construction.

      (22) The content of lines 228-233 in the "Results" section bears resemblance to that of the "Introduction." To ensure the avoidance of information duplication, it is recommended to remove this paragraph from the results section.

      This section has been streamlined.

      (23) The bands of mmiMDA5 in the 5'ppp-RNA and dsRNA lanes in Figure 4B are weak and almost unobservable. Please replace them with clear images.

      We have rerun this experiment and replaced the images (Figure 4B).

      (24) In Figure 5G and at line 253, there are only results presented for the SCRV infection group, while no results are shown for the control group. This raises the question of why the control group results are missing. It is necessary to provide a reasonable explanation or correction for this issue.

      The "0 h" infection time point of the SCRV virus is the control group, and we have replaced it with a more intuitive image (Figure 5G).

      (25) In Figure 7C, it would be necessary to include the western blot result of YTHDF protein expression in order to verify the efficiency of YTHDF siRNA.

      In fact, we have attempted to detect the endogenous expression of YTHDF protein using available commercial antibodies. Unfortunately, only the YTHDF2 antibody can specifically recognize the endogenous protein expression of YTHDF2 in M. miiuy. In addition, the knockdown effect of si-YTHDF2 has been validated by YTHDF2 antibody (doi: 10.4049/jimmunol.2200618).

      (26) In line 422 of the "4.3 Cell culture and treatment" section, the paragraph raises a question regarding the nature of Miiuy croaker kidney cells (MKCs) and spleen cells (MPCs) - whether they are cell lines or freshly isolated cells (or primary cultures) derived from kidney and spleen tissues. If these cells are indeed cell lines, it is requested to provide detailed information about the sources and properties of the cells (such as whether they are epithelial cells or other mixed cell types) and the generations of propagation. Alternatively, if the cells were freshly isolated or primary cultures obtained from fish, the method for cell isolation should be provided. The source and stability of cells are extremely important for ensuring the repeatability and reliability of experimental outcomes.

      M. miiuy kidney cells (MKCs) and spleen cells (MPCs) are cell lines derived from the kidney and spleen tissues of M. miiuy, with passages ranging from 20 to 40 times. These details have been incorporated into section 4.3.

      (27) There are many inaccurate descriptions in the text, which employ concepts that are too broad. These descriptions need to be narrowed down to specific species or objects. Here are a few examples, along with the necessary revisions. Other similar instances should also be revised accordingly. For instance, in line 119, "fish MDA5" should be changed to "Miiuy croaker MDA5." Similarly, in line 166, "fish MDA5-mediated signaling pathway" should be changed to "Miiuy croaker MDA5-mediated signaling pathway." In line 174, "fish MDA5" should be revised to "Miiuy croaker MDA5." Additionally, in line 185, "antiviral responses of teleost" should be changed to "antiviral responses of Miiuy croaker." In line 197, "interact with SCRV" should be revised to "interact with 5'ppp-RNA of SCRV." In line 337, "loss of RIG-I in the vertebrate" should be modified to "loss of RIG-I in Miiuy croaker and chicken." Similarly, in line 338, "MDA5 of fish" should be changed to "MDA5 of Miiuy croaker." Lastly, in line 348, "RIG-I deficient vertebrates" should be revised to "RIG-I deficient Miichthys miiuy and Gallus gallus."

      Thank you for the reviewer's suggestions. We have made revisions to these inaccurate descriptions and reviewed the entire manuscript to address similar statements with broad concepts.

      (28) Finally, it should be noted that a similar discovery has already been reported in tree shrews (Ling Xu, et al., Proc Natl Acad Sci., 2016, 113(39):10950-10955). This article shares similarities with that research report, therefore it is necessary to discuss in detail the relationship between the two in the discussion and compare and analyze the evolutionary patterns of MDA5 from it.

      Based on the reviewer's suggestions, we have compared the similarities and differences between these two reports during the discussion and analyzed the evolutionary dynamics of MDA5 in these vertebrates lacking RIG-I.

      Minor concerns:

      Thank you to the reviewer for their meticulous examination to our manuscript, we have made revisions to the following suggestions.

      (1) At line 120, the sentence "SCRV(one 5'ppp-RNA virus)" should have a space between "SCRV" and "(one 5'ppp-RNA virus)". Please make this correction.

      Corrected.

      (2) At lines 147-148, the sentence "However, the downstream gene of TOPORSa is missing a RIG-I" is not accurate and needs modification.

      We have modified this sentence.

      (3) At line 184, "findings indicate" should be corrected to "findings indicated".

      Corrected.

      (4) At line 189, "a 5'ppp-RNA virus" should be deleted and the text seems redundant.

      Deleted.

      (5) At line 198, "replication. (Figure 3C-3E)", please remove the punctuation between "replication" and "(Figure 3C-3E)".

      Corrected.

      (6) At line 416 in "Materials and methods" section, "4.2 Sample and challenge" should be corrected to "4.2 Fish and challenge".

      Corrected.

      (7) At line 419, the authors state that "The experimental procedure for SCRV infection was performed as described", please briefly describe the SCRV infection method and the infectious dose.

      Based on the reviewer's suggestions, we have added relevant descriptions of SCRV infection in section 4.2.

      (8) There are several formatting issues in the "Materials and Methods" section. For instance, in line 424, there is no space between the number and letter in "100 μg/ml" and "26 ℃" should be corrected to "26℃". Additionally, in line 430, "Cells" should be corrected to "cells".

      Corrected.

      (9) At line 446, "50 ng/ul" and "100 mU/ul" should be corrected to "50 ng/μl" and "100 mU/μl".

      Corrected.

      (10) At line 459, "primers 1)" should be corrected to "primers".

      Corrected.

      (11) At lines 461-464, the description "For protein purification, MDA5 plasmids with 6× His tag was constructed based on pcDNA3" seems to be no direct logical connection between protein purification and the plasmid construction. Please make the necessary corrections.

      Corrected.

      (12) At line 548, "cytoplasmic" should be corrected to "Cytoplasmic".

      Corrected.

      (13) At line 549, "5× 107" should be corrected to "5 × 107".

      Corrected.

      (14) At line 557, "MgCl2" should be corrected to "MgCl2".

      Corrected.

      (15) At line 558, "6 %" should be corrected to "6%".

      Corrected.

      (16) At line 565, "50μg" should be corrected to "50 μg".

      Corrected.

      (17) At line 571, "300{plus minus}50 bp." should be corrected to "300 {plus minus} 50 bp."

      Corrected.

      (18) At lines 592-593, the sentence "After several incubations, the m6A level was quantified colorimetrically at a wavelength of 450 nm" does not read smoothly, please improve it.

      Revised.

      (19) At line 786, "MDA5 recognize" should be corrected to "MDA5 recognized".

      Corrected.

      (20) At lines 788 and 798, "Pulldown" should be corrected to "Pull-down".

      Corrected.

      (21) At lines 790 and 796, "bluestaining" should be corrected to "blue staining".

      Deleted.

      (22) At line 825, "SCRV and infection" should be corrected to "SCRV infection".

      Corrected.

      (23) At lines 826-827, "SCRV (H) and poly(I:C) (I) infection" should be corrected to "SCRV infection (H) and poly(I:C) stimulation (I)".

      Corrected.

    1. Author response:

      We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

      • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

      We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

      We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

      Author response table 1.

      Results of AMPredictor with different graph edge encodings

      Author response table 2.

      Results of AMPredictor with different ESM versions

      Author response table 3.

      Evaluation of generated sequences with different sampling numbers

      • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

      We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work shows, based on basic laboratory investigations of invitro-grown bacteria as well as human bone samples, that conventional bacterial culture can substantially underrepresent the quantity of bacteria in infected tissues. This has often been mentioned in the literature, however, relatively limited data has been provided to date. This manuscript compares culture to a digital droplet PCR approach, which consistently showed greater levels of bacteria across the experiments (and for two different strains).

      Strengths:

      Consistency of findings across in vitro experiments and clinical biopsies. There are real-world clinical implications for the findings of this study.

      Weaknesses:

      No major weaknesses. Only three human samples were analyzed, although the results are compelling.

      We only put in three examples of clinical diagnosis to showcase the application of this method particularly to osteomyelitis. For further validation, larger cohort studies are required, which are currently underway.

      Reviewer #2 (Public Review):

      In this study, the authors address discrepancies in determining the local bacterial burden in osteomyelitis between that determined by culture and enumeration by DNA-directed assay. Discrepancies between culture and other means of bacterial enumeration are long established and highlighted by Staley and Konopka's classic, "The great plate count anomaly" (1985). Here, the authors first present data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in S. aureus strains infecting osteocyte-like cells. They go on to demonstrate PCR evidence that S. aureus can be detected in bone samples from sites meeting a widely accepted clinicopathological definition of osteomyelitis. They conclude their approach offers advantages in quantifying intracellular bacterial load in their in vitro "co-culture" system.

      The publication related to “The great plate count anomaly (1985)” has been added to revised version as new reference #2.

      Weaknesses

      - My main concern here is the significance of these results outside the model osteocyte system used by this group. Although they carefully avoid over-interpreting their results, there is a strong undercurrent suggesting their approach could enhance aetiologic diagnosis in osteomyelitis and that enumeration of the infecting pathogen might have clinical value. In the first place, molecular diagnostics such as 16S rDNA-directed PCR are well established in identifying pathogens that don't grow. Secondly, it is hard to see how enumeration could have value beyond in vitro and animal model studies since serial samples will rarely be available from clinical cases.

      Indeed, we initiated this study for the purpose of trying to improve the diagnostic outcomes for osteomyelitis, in particular that associated with prosthetic joint infection (PJI) but also all other forms, as the current gold-standard diagnostic approaches for this type of infection, either bacterial culture or whole genome sequencing, are very time consuming and costly, and yet are not necessarily accurate. Our method has the benefits (not limited to) of achieving absolute quantification of bacterial load in a shortened time period (in the order of hours) in clinical bone specimens from infected patients. Many of the identified bacterial species in patients were not able to be diagnosed by standard bacterial culturing. Moreover, one of the problematic features of treating bone infection is that repetitive surgeries are usually needed, particularly in PJI, hence, serial clinical bone specimens from the same patient are in fact often available. Therefore, our method of being able to quantify bacterial load offers the advantage of monitoring the infected status throughout the treatment journey. In this study, we chose the tuf gene as the targeting sequence to amplify the bacterial signal instead of the well-established 16S PCR for the reason that tuf provides much better sequence discrimination between bacterial species. Therefore, the short PCR amplicon of just 271 bp used in our study, is able to give us a highly accurate taxonomic readout. By this approach, we again shorten the time required for diagnosis. In the last paragraph of the Discussion in the revised manuscript, extra text, a figure demonstrating the strong sequence diversity in tuf (Supplementary Figure 2) and an additional reference have been added to address the Reviewer’s concerns.

      - I have further concerns regarding the interpretation of the combined bacterial and host cell-directed PCRs against the CFU results. Significance is attached to the relatively sustained genome counts against CFU declines. On the one hand, it must be clearly recognised that the detection of bacterial genomes does not equate to viable bacterial cells with the potential for further replication or production of pathogenic factors. Of equal importance is the potential contribution of extracellular DNA from lysed bacteria and host cells to these results. The authors must clarify what steps, if any, they have taken to eliminate such contributions for both bacteria and host cells. Even the treatment with lysotaphin may have coated their osteocyte cultures with bacterial DNA, contributing downstream to the ddPCR results presented.

      We agree that concerns around the interpretation of any molecular readout need to be taken into account. We have yet to find a method that can definitively identify bacterial viability in a clinical setting in the absence of culture. However, PJI and osteomyelitis in general is characterised by a high percentage of culture-negative infection cases, calling for such molecular approaches. Commercially available, so called “live/dead” bacterial PCR reagents exist that act as PCR signal inhibitors by penetrating the cell wall of compromised cells to prevent the PCR signal being generated from those cells. In our experience, while these can provide a certain level of added scrutiny in an experimental setting, they are not definitive because the reaction is often incomplete in an idealised situation and also the reagent may cancel signal from viable bacteria growing under conditions of stress, such as during antimicrobial treatment and host-derived stress imparted in intracellular or intra-tissue environments. Indeed, such stresses are likely contributors to clinical non-culturability. Whole genome sequencing would provide more certainty of bacterial viability to demonstrate genomic intactness but as we discuss herein, this a lengthy and costly process, and one which may prove difficult from host tissue with a low pathogen load. It should be noted that the significance of any diagnostic readout, including from culture, WGS or our method reported here would need to be interpreted by the treating clinical team. We would argue that a rapid, practical molecular diagnostic method in the absence or even presence of culture would provide treating clinicians with an improved rationale for tailoring antimicrobial treatments. 

      Strengths

      - On the positive side, the authors provide clear evidence for the value of the direct buffer extraction system they used as well as confirming the utility of ddPCR for quantification. In addition, the successful application of MinION technology to sequence the EF-Tu amplicons from clinical samples is of interest.

      - Moreover, the phenomenology of the infection studies indicating greater DNA than CFU persistence and differences between the strains and the different MOI inoculations are interesting and well-described, although I have concerns regarding interpretation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Vuong and colleagues reports a study that pooled data from 3 separate longitudinal studies that collectively spanned an observation period of over 15 years. The authors examined for correlation between viraemia measured at various days from illness onset with thrombocytopaenia and severe dengue, according to the WHO 2009 classification scheme. The motivation for this study is both to support the use of viraemia measurement as a prognostic indicator of dengue and also when an antiviral drug becomes licensed for use, to guide the selection of patients for antiviral therapy. They found that the four DENVs show differences in peak and duration of viraemia and that viraemia levels before day 5 but not those after from illness onset correlated with platelet count and plasma leakage at day 7 onwards. They concluded that the viraemia kinetics call for early measurement of viraemia levels in the early febrile phase of illness.

      Strengths:

      This is a unique study due to the large sample size and longitudinal viraemia measurements in the study subjects. The data addresses a gap in information in the literature, where although it has been widely indicated that viraemia levels are useful when collected early in the course of illness, this is the first time anyone has systematically examined this notion.

      Weaknesses:

      The study only analysed data from dengue patients in Vietnam. Moreover, the majority of these patients had DENV-1 infection; few had DENV-4 infection. The data could thus be skewed by the imbalance in the prevalence of the different types of DENV during the period of observation. The use of patient-reported time of symptom onset as a reference point for viraemia measurement is pragmatic although there is subjectivity and thus noise in the data.

      We acknowledge and appreciate your comments regarding the limitations of our study, including the pooled data from Vietnam and the use of symptom onset as a reference point for viremia kinetics. These points have been incorporated into the “Limitations” section.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript highlights very important findings in the field, especially in designing clinical trials for the evaluation of antivirals.

      Strengths:

      The study shows significant differences between the kinetics of viral loads between serotypes, which is very interesting and should be taken into account when designing trials for antivirals.

      Weaknesses:

      The kinetics of the viral loads based on disease severity throughout the illness are not described, and it would be important if this could be analyzed.

      In response to your suggestion, we have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes. Our findings demonstrate that a faster rate of viremia decline is associated with a reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Reviewer #1 (Recommendations For The Authors):

      Several areas require additional attention. I have limited my comments on the findings as I am not a mathematician and cannot knowledgeably comment on the statistical modelling methods.

      Comment #1: Lines 83-84. Although viraemia level shows declining trends from illness onset and thus lessens its prognostic value, it remains unknown if a more rapid rate of decline in viraemia is associated with a reduced risk of severe dengue. This is the fundamental premise of antiviral drug development for the treatment of dengue. The authors are uniquely poised to show if this logic that underpins antiviral development is likely correct and perhaps even estimate the extent to which a decline in viraemia needs to occur for a measurable reduction in the risk of severe dengue. Could the authors consider such an analysis?

      We appreciate your valuable suggestion. In response, we have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes Utilizing a model of viremia kinetics with the assumption of a linear log-10 viremia decrease over time, we calculated the rate of decline for each patient. Our findings demonstrate that a faster rate of viremia decline is associated with a significantly reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Comment #2: Lines 101-102. Studies A and B were conducted in parallel, and several patients enrolled in study A from primary healthcare clinics were eventually also enrolled in study B upon hospitalization. It would be helpful to know how many patients from study A were included in study B. It would also be useful for the authors to indicate if such inclusion would constitute double-counting at any point in their analyses.

      To address potential confusion regarding patient overlap between studies A and B, we have provided further clarification in the revised manuscript’s Legend of Figure 1. Among confirmed dengue patients, 31 individuals enrolled in study A were later included in study B upon hospitalization. Of these, 9 had viremia measurements available in both studies and were consequently analysed in study A only. The remaining 22 lacked viremia data in study A but had measurements in study B, leading to their inclusion in study B in the analysis. We have taken meticulous care to ensure no patient data is double-counted.

      Comment #3: Lines 126-127. The definition of probable primary and secondary dengue from IgG measurements needs more detail. How was the anti-DENV IgG ELISA data from paired sera interpreted?

      To ensure clarity, we have moved the definitions of probable primary and secondary infections from the supplementary file (Appendix 2) to the main text of the revised manuscript (Methods section – Plasma viremia measurement, dengue diagnostics, and clinical endpoints – page 6): “A probable primary infection was defined by two negative/equivocal IgG results on separate samples taken at least two days apart within the first ten days of symptom onset, with at least one sample during the convalescent phase (days 6-10). A probable secondary infection was defined by at least one positive IgG result during the first ten days. Cases without time-appropriate IgG results were classified as indeterminate.”

      Comment #4: Lines 230-232 and Figure 4. The findings reported in Figure 4 are curious. Why is the platelet count highest (significantly?) for DENV-1 compared to other DENV-type infections at low viraemia levels on LM days 1-3? Does that also mean that DENV-3 and -4 infections have a greater impact on platelet counts at days 7-10 than DENV-1 and -2?

      In our analyses, we allowed the relation between viremia and platelet count to differ by serotype. Figure 4 shows the highest platelet counts for DENV-1 compared to other serotypes, especially at low viremia levels. Apparently, while DENV-1 on average has higher viremia (Figure 3), the same viremia level in DENV-1 compared to other serotypes is associated with a less severe disease course and higher platelet count. This does not necessarily imply that platelet count overall, uncorrected for viremia level, differs by genotype. Indeed, our unpublished analysis (shown below) indicates a modest influence of serotype on platelet count.

      Author response image 1.

      Comment #5: Figure 5. In a recent paper (Vuong et al, Clin Infect Dis 2021), the authors show elegantly that the viraemia levels on admission correlated with severe dengue. However, these correlations were different for each of the four DENV types and whether the infection was primary or secondary. Why wasn't the analysis in Figure 5 further stratified by their probable primary or secondary dengue status?

      We appreciate your feedback and have stratified Figure 5 by serotype and immune status as suggested. Please note that due to the limited number of severe dengue in primary infections (only 1 case in DENV-1) and plasma leakage in primary DENV-4 (see Appendix 4-table 1), the estimated probability of having these outcomes is nearly zero across all viremia levels within these subgroups.

      Comment #6: Line 279. The description in this line is at odds with the data in Figure 3A, which shows that DENV-2 could be detected over a longer period than DENV-1 as the one-step RT-qPCR assay has a lower detection limit than DENV-1.

      In response to your feedback, we have revised the description to clarify that DENV-1 exhibits higher viremia levels compared to DENV-2 and DENV-3 in the revised manuscript (page 18).

      Reviewer #2 (Recommendations For The Authors):

      Introduction

      Comment #1: Line 56: the authors state that viraemia is associated with dengue disease severity and cite their previous results. They then summarize the results of this study and others. The highlights of this paper should be described in more detail. It is important that the authors state the conclusions of their own paper, including that the association was not very strong and that the viral loads were lowest with DENV2, but DENV2 was associated with more severe disease.

      Thank you for your comment. To improve the introduction’s flow, we have removed that sentence in line 56 of the manuscript and have added the weak association in the next paragraph (pages 3-4).

      Comment #2: It would be important to cite smaller studies that show a delay in clearance of the virus being associated with more severe disease outcomes.

      Thanks for your suggestion. We have added information to the introduction (page 4), highlighting a study which found a slower rate of viral clearance to be associated with more severe outcomes (Wang et al., 2008). However, other studies have shown no association (Vaughn et al., 2000; Fox et al., 2011). This lack of conclusive evidence underscores the need for further research.

      Methods

      Comment #3: The authors highlight the possible discrepancies in comparing viral kinetics of two RT-PCR methods. Although it is not ideal to combine such results, the authors have analyzed them separately, providing valuable data.

      We appreciate your comment.

      Comment #4: Which tests were used to define the immune status as primary and secondary? What were the definitions?

      We have moved the definitions of probable primary and secondary infections from the supplementary file (Appendix 2) to the main text of the revised manuscript (Methods section – Plasma viremia measurement, dengue diagnostics, and clinical endpoints – page 6): “A probable primary infection was defined by two negative/equivocal IgG results on separate samples taken at least two days apart within the first ten days of symptom onset, with at least one sample during the convalescent phase (days 6-10). A probable secondary infection was defined by at least one positive IgG result during the first ten days. Cases without time-appropriate IgG results were classified as indeterminate.”

      Results

      Comment #5: It is interesting that DENV2 showed the slowest decline, but yet associated with overall lower viral loads during early illness and more severe disease outcomes. Could delayed clearance of the virus be associated with disease severity?

      We have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes Utilizing a model of viremia kinetics with the assumption of a linear log-10 viremia decrease over time, we calculated the rate of decline for each patient. Our findings demonstrate that a faster rate of viremia decline is associated with a significantly reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Comment #6: Were there any differences in the kinetics of viral loads in children vs adults? I.e. children, young adults and older adults (>60 or 50?). Or were there insufficient numbers for this comparison?

      To address this point, we have modified the reported results of Figure 3-D by ages of 5, 10, 15, 25, and 50 years, represented children, adolescents, young adults, and older adults. Our analysis shows that viremia kinetics are largely similar across ages.

      Comment #7: Did any patients have comorbidities such as diabetes, obesity etc... if so, were there any differences in the viral loads?

      We appreciate your interest in the potential impact of comorbidities on viral loads. However, due to data limitations, we were unable to analyze this association. Only 6 patients had documented diabetes in the pooled dataset. In study C, 39 patients had obesity, whereas body mass index data is not available for studies A and B, although reports suggest a lower prevalence of obesity compared to study C.

      Comment #8: Were there any differences in the kinetics of the overall viral loads between DF/DHF/DSS or dengue with warning signs, without warning signs and severe dengue? Especially related to the time for viral clearance?

      Thank you for your suggestion. Such analysis reverses time and the causal direction, while we are more interested in looking forward. Therefore, instead of analyzing viremia kinetics based on disease severity, we have added an analysis to investigate the relationship between the rate of decline in viremia and clinical outcomes, as shown in the response to your comment #5. Results show that a more rapid rate of viremia decline is associated with a reduced risk of more severe clinical outcomes. In addition, in this study, we selected two clinical outcomes severe dengue and plasma leakage. The definitions are based on the WHO 2009 guidelines and standard endpoint definitions for dengue trials (Tomashek et al., 2018).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the comments and suggestions from the Editor and Reviewers about our manuscript submitted to the eLife Journal. We have addressed all the comments, and we think these modifications will help bring clarity to our message and be helpful to your readership. Here we include an outline of the corrections performed, as well as a detailed response to each of the reviewer’s comments.

      As per the Editor and Reviewers suggestions, outline of corrections:

      ·        The title of the manuscript has been changed to reflect a more conservative conclusion.

      ·        Changes in the main manuscript text were made to enhance clarity, including the use genetic terminology and naming.

      ·        Specific responses to some comments from the reviewers are included in this document. We combined some comments that would be better addressed together.

      Accompanied to this letter is an updated version of our manuscript with the track changes feature enabled. Again, we are thankful of the comments and suggestions we received, and we hope this revised version of our manuscript will be accompanied by an updated assessment and public reviews and a final eLife Version of Record.

      Response to the public review and minor recommendations.

      From Reviewer #1:

      The major inference of the work is that SIV infection of gorillas drove the observed diversity in gorilla CD4. This is supported by the majority of SNPs being localized to the CD4 D1, which directly interacts with the envelope, and the demonstrated functional consequences of that diversity for viral entry. However, SIVgor (to the best of my knowledge) only infects Western lowland gorillas (Gorilla gorilla gorilla), and one Gorilla gorilla diehli and three Gorilla beringei graueri individuals were included in the haplotype and allele frequency analyses. The presence of these haplotypes or the presence of similar allele frequencies in Eastern lowland and mountain gorillas would impact this conclusion. It would be helpful for the authors to clarify this point.

      From Reviewer #1 (minor comment):

      Which subspecies of gorilla are the nsSNPs coming from? Gorilla gorilla diehli [n =1]; Gorilla beringei graueri [n = 3]) are not extant reservoirs of SIV and to my knowledge are not thought to have been, and so it's important to point out where the diversity is coming from if the authors are asserting that SIVgor drove this population-level diversity in gorilla CD4.

      We initially included genomic data from all the gorilla individuals available to maximize sensitivity to identify allelic variants. Although evidence points to eastern gorillas not being currently infected with SIV, our results show that all allelic variants identified have differential susceptibility to the HIV-1 and SIVcpz strains tested. The allelic variants we identified with this genomic data set match the variants identified by Russell et al (doi.org/10.1073/pnas.2025914118), including the ones found in eastern gorillas, and recapitulate that those variants have differential susceptibility to lentiviral entry, similar to the variants of western populations. Whether eastern gorillas have been exposed to lentiviruses in the past remains unknown.

      From Reviewer #1:

      The authors appear to use a somewhat atypical approach to assess intra-population selection to compensate for relatively small numbers of NHP sequences (Fig. 6). However, they do not cite precedence for the robustness of the approach or the practice of grouping sequences from multiple species for the endemic vs other comparison. They also state in the methods that some genes encoded in the locus were removed from the analysis "because they have previously been shown to directly interact with a viral protein." This seems to undercut the analysis and prevents alternative explanations for the observed diversity in CD4 (e.g., passenger mutations from selection at a neighboring locus).

      Given the nature of our samples, to detect any influence of natural selection acting on CD4, we chose to compare patterns of molecular evolution of CD4 to its neighboring loci. Comparisons of molecular evolution signatures across genomic regions are the basis of methods to detect positive selection (e.g., Sabeti DOI: 10.1038/nature01140). For our comparison, the neighboring loci represent our neutral standard for the genomic region CD4 resides. Our rationale is that demographic and neutral influences on the number and frequency of polymorphic sites in a region would equally affect all loci in a genomic region. Because these neighboring loci are our neutral benchmark, we excluded before analysis other genes in this genomic region that interact with viruses. The logic is that these loci may be evolving under the influence of positive selection and would decrease the power of our comparison. None of the excluded loci are direct neighbors to CD4. This, and given that the CD4 genomic region in humans is of average recombination rate, dampens the possibility that what we are observing at CD4 is due to selection acting at a neighboring locus. In addition, the classic population genetic method to detect positive selection, the McDonald-Kreitman test (McDonald DOI: 10.1038/351652a0), was originally presented combining polymorphism data across species. We assume that any effect on levels of diversity created by combining variability between species would equally affect all loci included in the study, not just CD4.

      From Reviewer #1:

      Data in Figure 5 is graphed as % infected cells instead of virus titer (TDU/mL). It's unclear why this is the case, and prevents a comparison to data in Figure 2 and Figure 4.

      From Reviewer #1 (minor comment):

      Figure 5: the data presentation is now shown as % infected cells instead of viral titer. This makes it difficult to compare data from Figure 5 to other figures. Can the authors please either justify this change, display data consistently or provide matched data displays as a Supplemental Figure?

      For the experiments presented in figures 2 and 4 we used different volumes of infecting pseudoviruses, which allowed us to identify the linear range of infection. Then, based on the number of cells plated per experimental replicate, we calculated a virus titer. In follow-up experiments (Fig. 5), we used fixed volumes of virus that would infect ~10-20% of control (wild-type; wt) CD4-expressing cells. Comparisons were then made between wt and mutated CD4s, and these data are best presented in their raw forms as percent cells infected.  Although this change in method prevents direct comparison between the figures, we focused on the differences observed between the experimental conditions per experimental panel.

      From Reviewer #1:

      The lack of pseudotyping with SIVgor envelope is a surprising omission from this study, that would help to contextualize the findings.

      From Reviewer #2 (minor comment):

      The inclusion of HIV-1 but not SIVgor strains in Figures 2D/E is somewhat conspicuous since chimpanzee alleles certainly differ in susceptibility to SIVcpz (and SIVgor) strains per Russell et al. 2021. The authors should either test some SIVgor infections, cite published data on at least extant human/chimpanzee/gorilla CD4 susceptibility to SIVgor, or address why they did not include it.

      We agree the data of host susceptibility to SIVgor strains would have been an interesting question to explore. However, we opted to focus on the transmission of SIVcpz strains into gorilla populations for this study. It is worth mentioning that we have cloned SIVgor envelope genes from some strains into our expression system, but we were unable to recover infectious pseudoviruses using an HIV-1DEnv-GFP backbone. This suggests that HIV-1 may be incompatible with incorporating SIVgor Env into virus particles. Recently, Russell et al (DOI: 10.1073/pnas.2025914118) managed to generate SIVgor Env pseudotyped virions using a different backbone (SIVcpzDEnv-GFP) that was unavailable to us at the time of this study.

      From Reviewer #1:

      Similarly, building gorilla CD4 haplotype SNPs onto the hominin ancestor (as opposed to extant human CD4) may provide additional insights that are meaningful toward understanding the evolutionary trajectory of gorilla CD4.

      We decided to use the extant human CD4 as a backbone to test the effects on the individual amino acid variants found in the allelic diversity of the gorilla population since the human protein is highly susceptible to all the HIV-1 and SIV strains tested, and the expected phenotype is a loss-of-function. Since the D1 of the human and ancestral sequences for CD4 are almost identical (except for a change that is fixed in gorillas), and they showed similar levels of susceptibility to lentivirus entry, we expect that the phenotypes found would be the same if the gorilla SNPs were built into the ancestral CD4 backbone.

      From Reviewer #2:

      To bolster the argument that lentiviruses are indeed the causative driver of this diversification, which seems likely from a logical perspective but is difficult to prove, Warren et al. pursue two novel lines of evidence. First, the authors reconstruct ancestral CD4 genes that predate lentiviral infection of hominid populations. They then demonstrate that resistance to lentiviral infection is a derived trait in chimpanzees and gorillas, which have been co-evolving with endemic lentiviruses, but not in humans, which only recently acquired HIV. Nevertheless, the derived resistance could be stochastic or due to drift. This argument would be strengthened by demonstrating that bonobo and orangutan CD4, which also do not have endemic lentiviruses, resemble the ancestral and human susceptibility to great-ape-infecting lentiviruses.

      From Reviewer #2 (minor comment):

      The data presented in Figure 2, showing that chimp and gorilla (but not human) CD4 resistance to lentiviral infection is a derived trait, is very intriguing for suggesting that endemic lentiviruses are the causative driver of CD4 evolution. Nevertheless, this could be stochastic or due to genetic drift. Given the later emphasis on several other non-endemically infected species, the authors should at the very least include the sequences for bonobo and orangutan CD4 in the presented alignment (Fig 2B). Ideally, they would also test these orthologs to demonstrate that they are not resistant to lentiviruses infecting great apes (SIVcpz / HIV-1 / SIVgor). If they have also derived resistance, this would suggest a possible other evolutionary driver or genetic drift.

      Based on our analysis on polymorphic sites using available data from populations of apes, we strongly believe the accumulation of resistant polymorphisms in CD4 did not arise in a stochastic manner. The frequency and accumulation of these changes strongly correlate with the function of CD4 as a receptor for lentivirus entry. We agree that experimentally testing the CD4 protein from bonobo and orangutan would strengthen our conclusions; however, based on our genomic analyses, we decided to focus on the species that would present a higher level of variability of susceptibility to the lentivirus tested, namely gorillas and chimpanzees.

      From Reviewer #2:

      Warren et al. provide a population genetic argument that only endemically infected primates exhibit diversifying selection, again arguing for endemic lentiviruses being the evolutionary driver. The authors compare SNP occurrence in CD4 to neighboring genes, demonstrating that non-synonymous SNP frequency is only elevated in endemically infected species. Moreover, these amino-acid-coding changes are significantly concentrated in the CD4 domain that binds the lentiviral envelope. This is a creative analysis to overcome the problem of very small sample sizes, with very few great ape individuals sequenced. The additional small number of species compared (2-3 in each group) also limits the power of the analysis; the authors could consider expanding their analysis to Old World Monkey species that do or do not have endemic lentiviruses, as well as great apes.

      The scope of this project was to evaluate the differential phenotype of the accumulated polymorphisms found in the ape branch of the primates. Although evaluating the accumulation of polymorphisms in a broader range of primates would generate interesting observations, this would likely require increasing the total number of primate species to include sampling along the speciation tree, many of which lack population level data.

      From Reviewer #1 (minor comment):

      Ancestral reconstruction methods and associated data tables should be included to indicate statistical support for assigned codons. A comment on ambiguity at relevant positions is needed. Similarly, given the polymorphic nature of gorilla and chimpanzee CD4, how confident are the authors in their ancestral reconstructions based on a single representative genome per species? Does this change when you include the broader panel of gorilla sequences? Is the ancestral reconstruction robust to other methods besides PAML?

      We used the PAML software package to reconstruct the ancestral hominin and hominid sequence of CD4 because it is a standard and well recognized method for this purpose. For this analysis, we used the set of primate sequences selected for positive selection analyses (see methods), namely the longest isoform sequences for each of the available species that best aligned with human CD4. We feel that the best way to perform to the ancestral state reconstruction was to use only these curated sequences instead of the population level sequences, removing potential biases introduced by having different numbers of variants per species. 

      From Reviewer #1 (minor comment):

      Page 10: "It seems that allele 2, which doesn't have this glycan, would be at a fitness disadvantage. In support of this, allele 2 is one of the least frequent alleles in the gorilla population that we surveyed (Figure 3B)." - this inference depends on the gorilla species that encode allele 2 and allele frequencies. There are statistical tests to address this inference.

      Population genetic statistics that test for skews in sample allele frequencies are not appropriate here due to the nature of the samples in this study. However, the reviewer is correct that our inference in allele frequency is dependent on the gorilla species that we find this allele in. Allele 2 is found in the Gorilla beringei graueri subspecies of gorilla included in this study.  We only have data for three individuals (six alleles) from this subspecies compared to 51 individual (102 alleles) from Gorilla gorilla gorilla. As such, genetic subdivision between the gorilla subspecies could also produce the low frequency of allele 2 observed in our sample.

      From Reviewer #1 (minor comment):

      Page 11: "These results imply that the resistance to SIVcpz found in gorilla individuals is not dependent on single amino acids, but rather the cumulative effect of multiple SNPs." Would it be more relevant (or relevant in other ways) to test this statement by putting those mutations into the hominid ancestor? Testing individual residues in the context of human CD4 may be subject to epistasis or several other factors.

      We agree that constructing multiple of the resistant SNPs in the susceptible human background would have strengthened our hypothesis, as all these amino acid changes are associated with increased resistance to at least one of the lentiviruses tested. However, the number of CD4 variants to test would increase significantly and we feel that this approach was out of the scope of this manuscript.

      From Reviewer #1 (minor comment):

      Figure 6: If you perform this analysis on chimpanzee CD4 alone do you get the same result? Just gorillas? If you remove eastern/mountain gorillas? The very small numbers of non-human non-SIV-reservoir great apes may preclude a strong conclusion.

      We agree that our study is limited by the small number of available sequences from individuals of the studied species. If we remove a whole species or subspecies the statistical power would be greatly reduced. Removing all chimpanzees or gorillas (or a subspecies) would still show that only each of those species accumulate SNPs in the D1 region of CD4, although with less statistical significance.

      From Reviewer #2 (minor comment):

      Related to Figure 2: It would strengthen the argument that resistance is a derived trait if the authors mapped the causative mutations from gorilla CD4 onto the ancestral hominin CD4. However, this experiment is not particularly critical, merely a suggestion.

      We appreciate this suggestion. We decided to use the human CD4 backbone as it is widely susceptible to lentiviral entry. The hominid and hominin ancestral sequences are almost identical to the human sequence in domain 1, except for a fixed mutation shared with the gorilla CD4. We expect that the SNPs observed in the gorilla population would also reduce susceptibility to lentivirus entry in the ancestral CD4 reconstructions.

      From Reviewer #2 (minor comment):

      Related to Figure 3B: It is difficult to make much of the allele frequency for 8 alleles in 32 individuals. Can the authors collate this with allele frequency for the referenced 100 individuals from Russell et al. 2021, to give a better sense of population frequency? This may allow the authors to better correlate allele frequency with SIVcpz resistance patterns in Figure 4, strengthening their argument that more resistant alleles should be over-represented in the population.

      At the time of our analysis the data from Russell (DOI: 10.1073/pnas.2025914118) was not available to collate or compare. When that data became available, we immediately compared the existence of the alleles found and confirmed that the ones we found were also detected in the samples used in that study.

      From Reviewer #2 (minor comment):

      Related to Figure 6: As written, several methodological details should be clarified. How were human genomes selected to limit the sample size to 50?

      We selected a total of 50 human individuals in order to size-match the sample size of the largest group in Fig 6B (chimpanzee, n=50). We randomly selected 10 individuals for each of the 5 superpopulations [Africans (AFR), Admixed Americans (AMR), East Asians (EAS), Europeans (EUR) and South Asians (SAS)] defined by the 1000 Genome Project.

      From Reviewer #2 (minor comment):

      Related to Figure 6: What comparison is being reported for the Mann-Whitney U test (CD4 vs. which gene)? Are the means shown in A an average of 2 (endemic) or 3 (non-endemic) species - if so, the authors should show the individual data points to give a clearer depiction of the data spread. In addition, it is not clear that a statistical test with sample sizes of 2 is meaningful, since Mann Whitney typically assumes n > 5. To strengthen this statistical argument, it may be necessary to include additional species that have (a) multiple genomes (or at least this locus) sequenced, and (b) have or lack lentiviral sequences. This may necessitate expanding the analysis to include Old World Monkeys (e.g. Rhesus Macaque Genome Project).

      In the Figure 6 we use the Mann-Whitney U test to compare variation between CD4 and the neighboring loci. The average and SEM are for two endemic and four non-endemic species (two orangutan datasets are from two distinct species vs the gorilla subspecies). It is true our sample size is small for any statistical testing. For the Mann-Whitney U-test it is generally preferred to have n > 5 in each group. So, we do run into problems with the endemically infected comparisons as we only have two data points (chimpanzee and gorilla) for the CD4 group. For the uninfected species, CD4 has four data points.

      From Reviewer #1 (minor comment):

      Page 6. "This suggests that the ancestral versions of CD4 in apes were susceptible to primate lentivirus entry" - The data show that tested virus pseudotyped with SIV/HIV envs can engage ancestral CD4 in the context of a canine cell line expressing human CCR5, but not necessarily that this interaction was sufficient for the process of entry per se, especially in the context of a gorilla (or hominid) cell. Some additional context would be useful for a broad readership.

      From Reviewer #1 (minor comment):

      Page 6: "but that selective pressures exerted by SIVs in the chimpanzee and gorilla lineages have led to the retention of mutations that confer resistance to primate lentivirus infection. This has not happened in humans where selective pressure by HIV-1 is too new" - this cannot be concluded from the data in Figure 1. It would be more appropriate as a Discussion point.

      From Reviewer #1 (minor comment):

      Page 14: "Natural tolerance is often required before a virus can establish itself long term in a host reservoir, and thus understanding it is key to understanding virus reservoirs in nature" - please provide a reference. This is one among several theories of long-term host-virus evolution dynamics/outcomes, and further discussion may benefit the broad readership of eLife.

      From Reviewer #1 (minor comment):

      Page 15: "There is a surprising outcome of virus-driven host evolution in that the divergence and diversity of these host genes ultimately comes at a detriment to the very viruses that drove this evolution." - it is not clear to this reviewer why this is surprising.

      From Reviewer #2 (minor comment):

      Related to Figure 5A: The authors suggest that the gorilla glycosylation site provides resistance to SIVcpz, based on TAN1.910, but in fact the glycosylated allele is no more resistant than the un-glycosylated allele to most SIVcpz strains (in Figure 4). The authors should acknowledge this more clearly in the text.

      From Reviewer #2 (minor comment):

      The title of this article (that infection "has driven selection") is somewhat overstated - though it seems very likely that lentiviruses are driving CD4 diversification, this is difficult to prove. The arguments presented here rely on very few data points: modern chimp and gorilla compared to ancestral CD4, and a population genetic analysis relying on 2 or 3 species with 10-50 individuals each. The authors should either bolster these arguments (see the above suggestions) and/or soften the claim in the title.

      Modifications to the main text of the manuscript have been made to enhance clarity on the subjects stated above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We provide below a point-by-point reply to the Reviewers, and hope that our new manuscript will now meet the Reviewers’ concerns and the requirements for publication in eLife. 

      In summary, we have performed a new set of mouse humanization experiments using a new cohort of 4 additional HLA-DRB1*15-typed MS patients as donors, all presenting with highly active disease and under treatment with natalizumab. The new experiments aim to strengthen and further extend the findings of the original paper that HLA restriction rather than disease status plays an important role in the development of CNS inflammation. Additionally, we performed EAE using a revised protocol using lower amounts of peptide antigens to reduce the possibility of immune tolerance. Indeed, our original observations were further enriched with the finding that immunization increases infiltration of the CNS by human CD4 T cells, a finding consistent with EAE pathology, and that these human CD4 T cells co-localize with human CD8 T cells in the brain lesions. Further, we provide more detailed information concerning the EBV infection status of the PBMC donors used for humanization and find some first indications of relationships between the B cell engraftment in humanized mice, EBV status  of the donors and the development of brain lesions that might stimulate further investigation in future studies.   

      Point-by-point reply to reviewers:

      Reviewer 1:

      We thank Reviewer 1 for their valuable comments, and for their support of the overall approach as a model system. We have addressed the comments by providing additional requested information, as well as performing a EAE with a revised protocol, as suggested. We believe the new results significantly upgrades the information gained from this study.

      (1) Throughout their paper, the authors never quantify the difference in CD4 vs CD8 T cell infiltration into the CNS. While repeatedly claiming that there are fewer CD4 T cells present than CD8 T cells within the CNS, this data is not included. Further, spinal cord numbers of CD4 and CD8 are not provided in lieu of CD3 T cell characterization.

      Reply: We have now included quantitative data for the differences in CD4 vs CD8 T cells in the brain and spinal cord of non-immunized and EAE immunized mice. Thus, in brain (Fig. 2E) and spinal cord (Fig. 3D) of non-immunized mice, and brain (Fig. 4D, E, L) and spinal cord (Fig. 5D) of immunized mice we show data for numbers of hCD8 and hCD4 T cells, and ratios of CD4 to CD8 in at borders and parenchyma. Notably, using a revised EAE protocol in the second set of experiments, we observed a marked increase in hCD4 T cell infiltration at the CNS borders and parenchyma, an observation consistent with successful EAE immunization.

      B cells don't make up any significant component of the cells transferred from HLA-DR15 donors. While the cells transferred from the HLA-DR13 donor are composed of a considerable number of B cells, the mice that received these cells didn't develop any signs of neurologic disease.

      In the second experiment using new DR15 MS donors, we observed significant B cell engraftment also in several groups of DR15 MS mice. With the additional groups of mice, we were able to see a relationship between B cell engraftment in DR13 and DR15 MS mice with indicators of recent or ongoing reactivation of EBV. This is an interesting preliminary observation that might be tested in future larger studies. 

      (2) Incomplete exploration of potential experimental autoimmune encephalomyelitis (EAE) modeling. Comparison of the susceptibility of B2m-NOG mice to EAE dependent on various peptide doses would be highly informative. Given that the number of hCD45+ in the periphery of NOG mice decreases following this immunization it would be prudent for the authors to determine if such a high peptide dose is truly ideal for EAE development in this mouse model.

      Reply: We thank the reviewer for this critical comment. In the second group of experiments (DR15 MS2-5), we revised the EAE protocol to use lower amounts of peptides in a single immunization, thereby greatly reducing the exposure of human T cells to antigen and risk of tolerance/anergy. This resulted in (i), by-pass of the reduction in proportions of peripheral hCD45 cells following immunization in the peripheral blood (Fig. 1A), and (ii), increased numbers of hCD4 T cells and hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain (Fig. 4D,E) and spinal cord (Fig. 5D). 

      (3) The degree of myelin injury is not presented. The statement is repeatedly made that "demyelination was not observed in the brain or spinal cord" but no quantification of myelin staining is shown.  

      Reply: The reviewer refers to a pivotal feature (and limitation) of this particular humanized model. Despite significant T cell infiltration of white and grey matter regions of brain and spinal cord, there is no detectable demyelination. This has also been reported by in independent study using a similar humanized system (Zayoud et al., 2013). We have supplemented the figures with photomicrographs showing the presence of unperturbed myelin in the corpus callosum white T cell lesions (Fig. 4F, inset stained with Luxol fast blue), and a confocal micrograph in the same region double-immunostained for hCD45 immune cells and MBP (Fig. 4G). 

      Minor points:

      Method of quantification (e.g. cells per brain slice in figures 2E; 4E) is not very quantitative and should be justified or more appropriately updated to be more rigorous in methodology.

      Reply: In the new figures, we have changed the method of quantification of brain parenchyma infiltrating cells from per brain slice, to cells per tissue area mm2 (Fig. 2D, Fig. 4D).

      Fig. 4 data should be shown from un-immunized DR15 MS and DR15 HI mice.

      Reply: We now include the quantitative data from un-immunized mice compared to immunized mice in all groups (Fig. 4 C-E). 

      Reviewer 2:

      We thank Reviewer 2 for their very pertinent comments and overall for highlighting the importance of humanized mice as an approach for further understanding the pathobiology of MS. We also thank this reviewer for their positive comments concerning the study design, specifically the use of fresh PBMC isolated from HLADRB1-typed MS individuals and healthy control. The reviewer highlights 4 major weaknesses of the study that we have tried to address in order to increase the value of the study.

      (i) Lack of sufficient sample size (n=1 in each group) to make any conclusion.

      Reply: We have increased the sample size for the DR15 MS group from n=1 to n=5 by generating new humanized mice using PBMC freshly isolated from additional MS donors, all HLA-DRB1*5 with active RRMS and under treatment with natalizumab. Here we were able to maximize on our excellent collaboration with neurologists at the neighboring University Hospital, which runs a large organized MS outpatient clinic, with HLADRB1-typed MS individuals that are closely monitored over the course of their disease and therapy. In this way, we were able to address the engraftment success of human immune cells and variability in CNS lesion development across mice generated from 5 different DR15 MS patients. We also monitored markers for EBV activation status in all the patients used for mouse humanization in this study. 

      (ii) Lack of phenotype in mice.

      Reply: As already described in the results and address in the discussion, the B2m-NOG immunodeficient mouse strain used here is a state-of-the-art experimental tool for humanization studies, but unfortunately fails to support engraftment by human monocytes. We and previous groups (Zayoud et al., 2013) show that CNS lesions in humanized mice contain high numbers of hCD4 and CD8 T cells, accompanied by locally activated murine microglia and astrocytes, but lack human monocytes. The humanized mice contain large proportions of immature mouse CD11b+Ly6Chi monocytes in the periphery (Suppl. Table 4) but these cells are not recruited into the CNS in non-immunized or immunized humanized mice, potentially due to incompatible chemokine signals across mouse/human. The absence of human monocyte engraftment in this model is the most likely reason that lesions do not demyelinate and this limitation of the currently available host mouse strains is one that needs to be addressed before full modelling of CNS demyelination by human immune cells can be achieved.

      (iii) No disease phenotype even in humanized mice immunized for disease using standard disease induction protocol employed in an animal model of MS.

      Reply: As described above, following the suggestion of reviewer 1 (point 2) we revised the EAE protocol to use lower amounts of peptides given as a single immunization. This resulted in increased numbers of hCD4 T cells and the hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain ((Fig. 1E, Fig. 2D) and spinal cord (Fig. 5D), all indicative of a successful EAE immunization. Although immunized mice showed lesions with mixed populations of hCD4 and hCD8 T cells, demyelination and therefore clinical symptoms were again not observed. As outlined in (ii) above, successful human monocyte engraftment would be fundamental for the development of demyelination and clinical symptoms in PBMC humanized mice, and new immunodeficient animal strains should be developed to achieve this.  

      (iv) Mechanistic data on why CD8 T cells are more enriched than CD4+ T cells.

      Reply: The question of why hCD8 T cells are more enriched in the CNS than hCD4 cells is answered at least in part by the results from our new EAE experiments, which clearly show that immunization increases CNS infiltration by hCD4 T cells versus hCD8 T cells. In general, EAE protocols are designed to activate antigen-specific CD4 T cells and this is verified in the CNS of immunized humanized mice, where hCD4 T cells infiltrate to join hCD8T cells in lesion areas. The predilection of hCD8 T cells for CNS is obvious in non-immunized humanized mice, especially in the parenchyma (see Fig. 2E) and MS patients, while hCD4 infiltration becomes important after EAE immunization. The humanized model system might therefore represent a unique tool for studying mechanisms underlying preferential hCD8 T cell involvement in MS neuroinflammaton, a system that is not accurately modelled in current EAE models. As this reviewer correctly points out, this is very important point as postmortem MS patients’ brains have more CD8 T cells than CD4 T cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Response to the reviewers

      Reviewer 1:

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Initial reply: Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Revised reply: We clarified our modelling choices in the ”Modelling strategy” subsection of the results section.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      Initial reply: We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      •    Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      •    Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      •    Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      •    Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their *subjective* feelings. It might have been better to query participants about perceived stimulus intensity levels. This perspective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Initial reply: Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the relevance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      Initial reply: The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.12.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Initial reply: Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Revised reply: We restructured introduction, results and parts of the methods. We followed the reviewer’s suggestion regarding enhancing clarity through graphical diagrams. We have visualised the experimental design in Figure 1D. Furthemore, we have visualised the two main computational models (eRL and eKF) in Figure 2, following from Jepma et al. (2018). As a result, we have updated the notation in Section 4.4 to be clearer and consistent with the graphical representation (rename the variable referring to observed thermal input from Ot to Nt).

      Reviewer Comment 1.6 — In lines 99-100, the statement ”following the work by [23]” would be more helpful if it included a concise summary of the main concepts from the referenced work.

      - It would be helpful to have descriptions of the conditions that Figure 1C is elaborating on.

      - In line 364, the ”N {t}” in the sentence ”The observation on trial t, N {t}”, should be O {t}.

      Initial reply: Thank you for spotting these and for providing the suggestions. We will include the correction in the revised version.

      Revised reply: We have added the following regarding the lines 99-100:

      ”We build on the work by [23], who show that pain perception is strongly influenced by expectations as defined by a cue that predicts high or low pain. In contrast to the cue-paradigm from [23], the primary aim of our experiment was to determine whether the expectations participants hold about the sequence itself inform their perceptual beliefs about the intensity of the stimuli.”

      See comment in the previous reply, regarding the notation change from Ot to Nt.

      Reviewer 2:

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential implications for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Initial reply: Thank you very much for these positive comments.

      Reviewer 3:

      Summary:

      I am pleased to have had the opportunity to review this manuscript, which investigated the role of statistical learning in the modulation of pain perception. In short, the study showed that statistical aspects of temperature sequences, with respect to specific manipulations of stochasticity (i.e., randomness of a sequence) and volatility (i.e., speed at which a sequence unfolded) influenced pain perception. Computational modelling of perceptual variables (i.e., multi-dimensional ratings of perceived or predicted stimuli) indicated that models of perception weighted by expectations were the best explanation for the data. My comments below are not intended to undermine or question the quality of this research. Rather, they are offered with the intention of enhancing what is already a significant contribution to the pain neuroscience field. Below, I highlight the strengths and weaknesses of the manuscript and offer suggestions for incorporating additional methodological details.

      Strengths:

      The manuscript is articulate, coherent, and skilfully written, making it accessible and engaging.

      - The innovative stimulation paradigm enables the exploration of expectancy effects on perception without depending on external cues, lending a unique angle to the research.

      - By including participants’ ratings of both perceptual aspects and their confidence in what they perceived or predicted, the study provides an additional layer of information to the understanding of perceptual decision-making. This information was thoughtfully incorporated into the modelling, enabling the investigation of how confidence influences learning.

      - The computational modelling techniques utilised here are methodologically robust. I commend the authors for their attention to model and parameter recovery, a facet often neglected in previous computational neuroscience studies.

      - The well-chosen citations not only reflect a clear grasp of the current research landscape but also contribute thoughtfully to ongoing discussions within the field of pain neuroscience.

      Initial reply: We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Initial reply: Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally transformed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens.

      Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Revised reply: We re-plotted Figure 1E-F with a different exemplary participant, whose rating go above the pain threshold. We also included all participant pain perception and prediction ratings, noxious input sequences and confidence ratings in the supplement in Figures S1-S3.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      Initial reply: We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Revised reply: We clarified our modelling choices in the ”2.2 Modelling strategy” subsection.

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      Initial reply: While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Revised reply: We elaborated on the significance statements in the ”Modelling Results” subsection:

      • We considered at least a 2 sigma effect as indication of a significant difference. In each condition, the expectation weighted models (eKF and eRL) provided better fit than models without this element (KF and RL; approx. 2-4 sigma difference, as reported in Figure 5A-D). This suggests that regardless of the levels of volatility and stochasticity, participants still weigh perception of the stimuli with their expectation.

      and in the first paragraph of the Discussion:

      • When varying different levels of inherent uncertainty in the sequences of stimuli (stochasticity and volatility), the expectation and confidence weighted models fitted the data better than models weighted for confidence but not for expectations (Figure 5A-D). The expectation-weighted bayesian (KF) model offered a better fit than the expectation-weighted, model-free RL model, although in conditions of high stochasticity this difference was short of significance. Overall, this suggests that participants’ expectations play a significant role in the perception of sequences of noxious stimuli.

      We are aware of the limitations and lack of clear guidance regarding using sigma effects to establish significance (as per reviewer’s suggestion: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009). Here we decided to use the above-mentioned threshold of 2-sigma as an indication of significance, but note the potential limitations of the inferences - especially when distinguishing between eRL/eKF models.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      Initial reply: We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      Initial reply: It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Revised reply: We increased the number of simulations per model pair to ≈ 100 (after rejecting fits based on diagnostics criteria - E-BFMI and divergent transitions) and updated the confusion matrix (Table S4). Although the confusion between eRL and eKF remains, the model recovery shows good distinction between expectation weighted vs non-expectation weighted (and Random) models, which supports our main conclusion in the paper.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines significance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Initial reply: Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Revised reply: We clarify this further, as per our revised response to Comment 3.3 above. We have also added the following statement in section 4.5.1 (Methods, Model comparison): ”There’s no agreed-upon threshold of SEs that determines significance, but the higher the sigma difference, the more robust is the effect.”

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the xaxis and the recovered parameters on the y-axis would effectively convey this missing information.

      Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Initial reply: Thanks for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Revised reply: We included parameter recovery scatter plots for each model and parameter in the Supplement Figures S7-S11.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Initial reply: Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Revised reply: We have considered the suggested diagnostics and include bulk and tail ESS values for each condition, model, parameter in the Supplement Tables S6-S9. We also report number of chain with low E-BFMI (0), number of divergent transitions (0) and the E-BFMI values per chain in Table S10.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regulation.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      Initial reply: This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

      Revised reply: We have removed this statement from the revised version.

      Reviewer Comment 3.10 — In relation to the comment on model comparison in my public review, I believe the following link may provide further insight and clarify the basis for my observation. It discusses the use of standard error in model comparison and may be useful for the authors in addressing this particular point: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009

      Initial reply: Thank you for this suggestion, we will consider the forum discussion in our manuscript.

    1. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 40-42: The sentence "The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies as well as individual differences in cognitive function, and is regulated by genes" is a misstatement. Regional variations of structure-function coupling do not really reflect differences in cognitive function among individuals, but inter-subject variations do.

      Thank you for your comment. We have made revisions to the sentence to correct its misstatement. Please see lines 40-43: “The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies[1, 6-9] and is regulated by genes[6, 8], as well as its individual differences relates to cognitive function[8, 9].”

      (2) In Figure 1, the graph showing the relation between intensity and cortical depth needs explanation.

      Thank you for your comment. We have added necessary explanation, please see lines 133-134: “The MPC was used to map similarity networks of intracortical microstructure (voxel intensity sampled in different cortical depth) for each cortical node.”

      (3) Line 167: Change "increased" to "increase".

      We have corrected it, please see lines 173-174: “…networks significantly increased with age and exhibited greater increase.”

      (4) Line 195: Remove "were".

      We have corrected it, please see line 204: “…default mode networks significantly contributed to the prediction…”

      (5) Lines 233-240, Reproducibility analyses: Comparisons of parcellation templates were not made with respect to gene weights. Is there any particular reason?

      Thank you for your comment. We have quantified the gene weights based on HCPMMP using the same procedures. We identified a correlation (r \= 0.25, p<0.001) between the gene weights in HCPMMP and BNA. Given that this is a relatively weak correlation, we need to clarify the following points.

      Based on HCPMMP, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions[1]. The excluding 4 cortical regions that had an insufficient number of assigned samples may lead to different templates having a relatively weak correlation of gene associations. Moreover, the effect of different template resolutions on the results of human connectome-transcriptome association is still unclear.

      In brain connectome analysis, the choice of parcellation templates can indeed influence the subsequent findings to some extent. A methodological study[2] provided referenced correlations about 0.4~0.6 for white matter connectivity and 0.2~0.4 for white matter nodal property between two templates (refer to Figure 4 and 5 in [2]). Therefore, the age-related coupling changes as a downstream analysis was calculated using multimodal connectome and correlated with gene expression profiles, which may be influenced by the choice of templates. 

      We have further supplemented gene weights results obtained from HCPMMP to explicitly clarify the dependency of parcellation templates.

      Please see lines 251-252: “The gene weights of HCPMMP was consistent with that of BNA (r = 0.25, p < 0.001).”

      Author response image 1.

      The consistency of gene weights between HCPMMP and BNA.

      Please see lines 601-604: “Finally, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions based on HCPMMP and obtained the gene weights by PLS analysis. We performed Pearson's correlation analyses to assess the consistency of gene weights between HCPMMP and BNA.”

      Reviewer #2 (Recommendations For The Authors):

      Your paper is interesting to read and I found your efforts to evaluate the robustness of the results of different parcellation strategies and tractography methods very valuable. The work is globally easy to navigate and well written with informative good-quality figures, although I think some additional clarifications will be useful to improve readability. My suggestions and questions are detailed below (I aimed to group them by topic which did not always succeed so apologies if the comments are difficult to navigate, but I hope they will be useful for reflection and to incorporate in your work).

      * L34: 'developmental disorder'

      ** As far as I understand, the subjects in HCP-D are mostly healthy (L87). Thus, while your study provides interesting insights into typical brain development, I wonder if references to 'disorder' might be premature. In the future, it would be interesting to extend your approach to the atypical populations. In any case, it would be extremely helpful and appreciated if you included a figure visualising the distribution of behavioural scores within your population and in relationship to age at scan for your subjects (and to include a more detailed description of the assessment in the methods section) given that large part of your paper focuses on their prediction using coupling inputs (especially given a large drop of predictive performance after age correction). Such figures would allow the reader to better understand the cognitive variability within your data, but also potential age relationships, and generally give a better overview of your cohort.

      We agree with your comment that references to 'disorder' is premature. We have made revisions in abstract and conclusion. 

      Please see lines 33-34: “This study offers insight into the maturational principles of SC-FC coupling in typical development.”

      Please see lines 395-396: “Further investigations are needed to fully explore the clinical implications of SC-FC coupling for a range of developmental disorders.”

      In addition, we have included a more detailed description of the cognitive scores in the methods section and provided a figure to visualize the distributions of cognitive scores and in relationship to age for subjects. Please see lines 407-413: “Cognitive scores. We included 11 cognitive scores which were assessed with the National Institutes of Health (NIH) Toolbox Cognition Battery (https://www.healthmeasures.net/exploremeasurement-systems/nih-toolbox), including episodic memory, executive function/cognitive flexibility, executive function/inhibition, language/reading decoding, processing speed, language/vocabulary comprehension, working memory, fluid intelligence composite score, crystal intelligence composite score, early child intelligence composite score and total intelligence composite score. Distributions of these cognitive scores and their relationship with age are illustrated in Figure S12.”

      Author response image 2.

      Cognitive scores and age distributions of scans.

      * SC-FC coupling

      ** L162: 'Regarding functional subnetworks, SC-FC coupling increased disproportionately with age (Figure 3C)'.

      *** As far as I understand, in Figure 3C, the points are the correlation with age for a given ROI within the subnetwork. Is this correct? If yes, I am not sure how this shows a disproportionate increase in coupling. It seems that there is great variability of SC-FC correlation with age across regions within subnetworks, more so than the differences between networks. This would suggest that the coupling with age is regionally dependent rather than network-dependent? Maybe you could clarify?

      The points are the correlation with age for a given ROI within the subnetwork in Figure 3C. We have revised the description, please see lines 168-174: “Age correlation coefficients distributed within functional subnetworks were shown in Figure 3C. Regarding mean SC-FC coupling within functional subnetworks, the somatomotor (𝛽𝑎𝑔𝑒\=2.39E-03, F=4.73, p\=3.10E-06, r\=0.25, p\=1.67E07, Figure 3E), dorsal attention (𝛽𝑎𝑔𝑒\=1.40E-03, F=4.63, p\=4.86E-06, r\=0.24, p\=2.91E-07, Figure 3F), frontoparietal (𝛽𝑎𝑔𝑒 =2.11E-03, F=6.46, p\=2.80E-10, r\=0.33, p\=1.64E-12, Figure 3I) and default mode (𝛽𝑎𝑔𝑒 =9.71E-04, F=2.90, p\=3.94E-03, r\=0.15, p\=1.19E-03, Figure 3J) networks significantly increased with age and exhibited greater increase.” In addition, we agree with your comment that the coupling with age is more likely region-dependent than network-dependent. We have added the description, please see lines 329-332: “We also found the SC-FC coupling with age across regions within subnetworks has more variability than the differences between networks, suggesting that the coupling with age is more likely region-dependent than network-dependent.” This is why our subsequent analysis focused on regional coupling.  

      *** Additionally, we see from Figure 3C that regions within networks have very different changes with age. Given this variability (especially in the subnetworks where you show both positive and negative correlations with age for specific ROIs (i.e. all of them)), does it make sense then to show mean coupling over regions within the subnetworks which erases the differences in coupling with age relationships across regions (Figures 3D-J)?

      Considering the interest and interpretation for SC-FC coupling, showing the mean coupling at subnetwork scales with age correlation is needed, although this eliminates variability at regional scale. These results at different scales confirmed that coupling changes with age at this age group are mainly increased.

      *** Also, I think it would be interesting to show correlation coefficients across all regions, not only the significant ones (3B). Is there a spatially related tendency of increases/decreases (rather than a 'network' relationship)? Would it be interesting to show a similar figure to Figure S7 instead of only the significant regions?

      As your comment, we have supplemented the graph which shows correlation coefficients across all regions into Figure 3B. Similarly, we supplemented to the other figures (Figure S3-S6).

      Author response image 3.

      Aged-related changes in SC-FC coupling. (A) Increases in whole-brain coupling with age. (B) Correlation of age with SC-FC coupling across all regions and significant regions (p<0.05, FDR corrected). (C) Comparisons of age-related changes in SC-FC coupling among functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict 1.5× IQR from the first or third quartile. (D-J) Correlation of age with SC-FC coupling across the VIS, SM, DA, VA, LIM, FP and DM. VIS, visual network; SM, somatomotor network; DA, dorsal attention network; VA, ventral attention network; LIM, limbic network; FP, frontoparietal network; DM, default mode network.

      *** For the quantification of MPC.

      **** L421: you reconstructed 14 cortical surfaces from the wm to pial surface. If we take the max thickness of the cortex to be 4.5mm (Fischl & Dale, 2000), the sampling is above the resolution of your anatomical images (0.8mm). Could you expand on what the interest is in sampling such a higher number of surfaces given that the resolution is not enough to provide additional information?

      The surface reconstruction was based on state-of-the-art equivolumetric surface construction techniques[3] which provides a simplified recapitulation of cellular changes across the putative laminar structure of the cortex. By referencing a 100-μm resolution Merkerstained 3D histological reconstruction of an entire post mortem human brain (BigBrain: https://bigbrain.loris.ca/main.php), a methodological study[4] systematically evaluated MPC stability with four to 30 intracortical surfaces when the resolution of anatomical image was 0.7 mm, and selected 14 surfaces as the most stable solution. Importantly, it has been proved the in vivo approach can serve as a lower resolution yet biologically meaningful extension of the histological work[4]. 

      **** L424: did you aggregate intensities over regions using mean/median or other statistics?

      It might be useful to specify.

      Thank you for your careful comment. We have revised the description in lines 446-447: “We averaged the intensity profiles of vertices over 210 cortical regions according to the BNA”.

      **** L426: personal curiosity, why did you decide to remove the negative correlation of the intensity profiles from the MPC? Although this is a common practice in functional analyses (where the interpretation of negatives is debated), within the context of cortical correlations, the negative values might be interesting and informative on the level of microstructural relationships across regions (if you want to remove negative signs it might be worth taking their absolute values instead).

      We agree with your comment that the interpretation of negative correlation is debated in MPC. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach. As your comment, the negative correlation might be informative. We will also continue to explore the intrinsic information on the negative correlation reflecting microstructural relationships.

      **** L465: could you please expand on the notion of self-connections, it is not completely evident what this refers to.

      We have revised the description in lines 493-494: “𝑁𝑐 is the number of connection (𝑁𝑐 = 245 for BNA)”.

      **** Paragraph starting on L467: did you evaluate the multicollinearities between communication models? It is possibly rather high (especially for the same models with similar parameters (listed on L440-444)). Such dependence between variables might affect the estimates of feature importance (given the predictive models only care to minimize error, highly correlated features can be selected as a strong predictor while the impact of other features with similarly strong relationships with the target is minimized thus impacting the identification of reliable 'predictors').

      We agree with your comment. The covariance structure (multicollinearities) among the communication models have a high probability to lead to unreliable predictor weights. In our study, we applied Haufe's inversion transform[5] which resolves this issue by computing the covariance between the predicted FC and each communication models in the training set. More details for Haufe's inversion transform please see [5]. We further clarified in the manuscript, please see in lines 497-499: “And covariance structure among the predictors may lead to unreliable predictor weights. Thus, we applied Haufe's inversion transform[38] to address these issues and identify reliable communication mechanisms.”

      **** L474: I am not completely familiar with spin tests but to my understanding, this is a spatial permutation test. I am not sure how this applies to the evaluation of the robustness of feature weight estimates per region (if this was performed per region), it would be useful to provide a bit more detail to make it clearer.

      As your comment, we have supplemented the detail, please see lines 503-507: “Next, we generated 1,000 FC permutations through a spin test[86] for each nodal prediction in each subject and obtained random distributions of model weights. These weights were averaged over the group and were investigated the enrichment of the highest weights per region to assess whether the number of highest weights across communication models was significantly larger than that in a random discovery.”

      **** L477: 'significant communication models were used to represent WMC...', but in L103 you mention you select 3 models: communicability, mean first passage, and flow graphs. Do you want to say that only 3 models were 'significant' and these were exactly the same across all regions (and data splits/ parcellation strategies/ tractography methods)? In the methods, you describe a lot of analysis and testing but it is not completely clear how you come to the selection of the final 3, it would be beneficial to clarify. Also, the final 3 were selected on the whole dataset first and then the pipeline of SC-FC coupling/age assessment/behaviour predictions was run for every (WD, S1, S2) for both parcellations schemes and tractography methods or did you end up with different sets each time? It would be good to make the pipeline and design choices, including the validation bit clearer (a figure detailing all the steps which extend Figure 1 would be very useful to understand the design/choices and how they relate to different runs of the validation).

      Thank you for your comment. In all reproducibility analyses, we used the same 3 models which was selected on the main pipeline (probabilistic tractography and BNA parcellation). According to your comment, we produced a figure that included the pipeline of model selection as the extend of Figure 1. And the description please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” 

      Author response image 4.

      Pipeline of model selection and reproducibility analyses.

      **** Might the imbalance of features between structural connectivity and MPC affect the revealed SC-FC relationships (3 vs 1)? Why did you decide on this ratio rather than for example best WM structural descriptor + MPC?

      We understand your concern. The WMC communication models represent diverse geometric, topological, or dynamic factors. In order to describe the properties of WMC as best as possible, we selected three communication models after controlling covariance structure that can significantly predict FC from the 27 models. Compared to MPC, this does present a potential feature imbalance problem. However, this still supports the conclusion that coupling models that incorporate microarchitectural properties yield more accurate predictions of FC from SC[6, 7]. The relevant experiments are shown in Figure S2 below. If only the best WM structural descriptor is used, this may lose some communication properties of WMC.

      **** L515: were intracranial volume and in-scanner head motion related to behavioural measures? These variables likely impact the inputs, do you expect them to influence the outcome assessments? Or is there a mistake on L518 and you actually corrected the input features rather than the behaviour measures?

      The in-scanner head motion and intracranial volume are related to some age-adjusted behavioural measures, as shown in the following table. The process of regression of covariates from cognitive measures was based on these two cognitive prediction studies [8, 9]. Please see lines 549-554: “Prior to applying the nested fivefold cross-validation framework to each behaviour measure, we regressed out covariates including sex, intracranial volume, and in-scanner head motion from the behaviour measure[59, 69]. Specifically, we estimated the regression coefficients of the covariates using the training set and applied them to the testing set. This regression procedure was repeated for each fold.”

      Author response table 1.

      ** Additionally, in the paper, you propose that the incorporation of cortical microstructural (myelin-related) descriptors with white-matter connectivity to explain FC provides for 'a more comprehensive perspective for characterizing the development of SC-FC coupling' (L60). This combination of cortical and white-matter structure is indeed interesting, however the benefits of incorporating different descriptors could be studied further. For example, comparing results of using only the white matter connectivity (assessed through selected communication models) ~ FC vs (white matter + MPC) ~ FC vs MPC ~ FC. Which descriptors better explain FC? Are the 'coupling trends' similar (or the same)? If yes, what is the additional benefit of using the more complex combination? This would also add strength to your statement at L317: 'These discrepancies likely arise from differences in coupling methods, highlighting the complementarity of our methods with existing findings'. Yes, discrepancies might be explained by the use of different SC inputs. However, it is difficult to see how discrepancies highlight complementarity - does MCP (and combination with wm) provide additional information to using wm structural alone?~

      According to your comment, we have added the analyses based on different models using only the myelin-related predictor or WM connectivity to predict FC, and further compared the results among different models. please see lines 519-521: “In addition, we have constructed the models using only MPC or SCs to predict FC, respectively. Spearman’s correlation was used to assess the consistency between spatial patterns based on different models.” 

      Please see lines 128-130: “In addition, the coupling pattern based on other models (using only MPC or only SCs to predict FC) and the comparison between the models were shown in Figure S2A-C.” Please see lines 178-179: “The age-related patterns of SC-FC coupling based other coupling models were shown in Figure S2D-F.”

      Although we found that there were spatial consistencies in the coupling patterns between different models, the incorporation of MPC with SC connectivity can improve the prediction of FC than the models based on only MPC or SC. For age-related changes in coupling, the differences between the models was further amplified. We agree with you that the complementarity cannot be explicitly quantified and we have revised the description, please see line 329: “These discrepancies likely arise from differences in coupling methods.”

      Author response image 5.

      Comparison results between different models. Spatial pattern of mean SC-FC coupling based on MPC ~ FC (A), SCs ~ FC (B), and MPC + SCs ~ FC (C). Correlation of age with SC-FC coupling across cortex based on MPC ~ FC (D), SCs ~ FC (E), and MPC + SCs ~ FC (F).

      ** For the interpretation of results: L31 'SC-FC coupling is positively associated with genes in oligodendrocyte-related pathways and negatively associated with astrocyte-related gene'; L124: positive myelin content with SC-FC coupling...and similarly on L81, L219, L299, L342, and L490:

      ***You use a T1/T2 ratio which is (in large part) a measure of myelin to estimate the coupling between SC and FC. Evaluation with SC-FC coupling with myeline described in Figure 2E is possibly biased by the choice of this feature. Similarly, it is possible that reported positive associations with oligodendrocyte-related pathways and SC-FC coupling in your work could in part result from a bias introduced by the 'myelin descriptor' (conversely, picking up the oligodendrocyte-related genes is a nice corroboration for the T1/T2 ration being a myelin descriptor, so that's nice). However, it is possible that if you used a different descriptor of the cortical microstructure, you might find different expression patterns associated with the SCFC coupling (for example using neurite density index might pick up neuronal-related genes?). As mentioned in my previous suggestions, I think it would be of interest to first use only the white matter structural connectivity feature to assess coupling to FC and assess the gene expression in the cortical regions to see if the same genes are related, and subsequently incorporate MPC to dissociate potential bias of using a myelin measure from genetic findings.

      Thank you for your insightful comments. In this paper, however, the core method of measuring coupling is to predict functional connections using multimodal structural connections, which may yield more information than a single modal. We agree with your comment that separating SCs and MPC to look at the genes involved in both separately could lead to interesting discoveries. We will continue to explore this in the future.

      ** Generally, I find it difficult to understand the interpretation of SC-FC coupling measures and would be interested to hear your thinking about this. As you mention on L290-294, how well SC predicts FC depends on which input features are used for the coupling assessment (more complex communication models, incorporating additional microstructural information etc 'yield more accurate predictions of FC' L291) - thus, calculated coupling can be interpreted as a measure of how well a particular set of input features explain FC (different sets will explain FC more or less well) ~ coupling is related to a measure of 'missing' information on the SC-FC relationship which is not contained within the particular set of structural descriptors - with this approach, the goal might be to determine the set that best, i.e. completely, explains FC to understand the link between structure and function. When you use the coupling measures for comparisons with age, cognition prediction etc, the 'status' of the SC-FC changes, it is no longer the amount of FC explained by the given SC descriptor set, but it's considered a descriptor in itself (rather than an effect of feature selection / SC-FC information overlap) - how do you interpret/argue for this shift of use?

      Thank you for your comment. In this paper, we obtain reasonable SC-FC coupling by determining the optimal set of structural features to explain the function. The coupling essentially measures the direct correspondence between structure and function. To study the relationship between coupling and age and cognition is actually to study the age correlation and cognitive correlation of this direct correspondence between structure and function. 

      ** In a similar vein to the above comment, I am interested to hear what you think: on L305 you mention that 'perfect SC-FC coupling may be unlikely'. Would this reasoning suggest that functional activity takes place through other means than (and is therefore somehow independent of) biological (structural) substrates? For now, I think one can only say that we have imperfect descriptors of the structure so there is always information missing to explain function, this however does not mean the SC and FC are not perfectly coupled (only that we look at insufficient structural descriptors - limitations of what imaging can assess, what we measure etc). This is in line with L305 where you mention that 'Moreover, our results suggested that regional preferential contributions across different SCs lead to variations in the underlying communication process'. This suggests that locally different areas might use different communication models which are not reflected in the measures of SC-FC coupling that was employed, not that the 'coupling' is lower or higher (or coupling is not perfect). This is also a change in approach to L293: 'This configuration effectively releases the association cortex from strong structural constraints' - the 'release' might only be in light of the particular structural descriptors you use - is it conceivable that a different communication model would be more appropriate (and show high coupling) in these areas.

      Thank you for your insightful comments. We have changed the description, please see lines 315317: “SC-FC coupling is dynamic and changes throughout the lifespan[7], particularly during adolescence[6,9], suggesting that perfect SC-FC coupling may require sufficient structural descriptors.” 

      *Cognitive predictions:

      ** From a practical stand-point, do you think SC-FC coupling is a better (more accurate) indicator of cognitive outcomes (for example for future prediction studies) than each modality alone (which is practically easier to obtain and process)? It would be useful to check the behavioural outcome predictions for each modality separately (as suggested above for coupling estimates). In case SC-FC coupling does not outperform each modality separately, what is the benefit of using their coupling? Similarly, it would be useful to compare to using only cortical myelin for the prediction (which you showed to increase in importance for the coupling). In the case of myelin->coupling-> intelligence, if you are able to predict outcomes with the same performance from myelin without the need for coupling measures, what is the benefit of coupling?

      From a predictive performance point of view, we do not believe that SC-FC coupling is a better indicator than a single mode (voxel, network or other indicator). Our starting point is to assess whether SC-FC coupling is related to the individual differences of cognitive performances rather than to prove its predictive power over other measures. As you suggest, it's a very interesting perspective on the predictive power of cognition by separating the various modalities and comparing them. We will continue to explore this issue in the future study.

      ** The statement on L187 'suggesting that increased SC-FC coupling during development is associated with higher intelligence' might not be completely appropriate before age corrections (especially given the large drop in performance that suggests confounding effects of age).

      According to your comment, we have removed the statement.

      ** L188: it might be useful to report the range of R across the outer cross-validation folds as from Figure 4A it is not completely clear that the predictive performance is above the random (0) threshold. (For the sake of clarity, on L180 it might be useful for the reader if you directly report that other outcomes were not above the random threshold).

      According to your comment, we have added the range of R and revised the description, please see lines 195-198: “Furthermore, even after controlling for age, SC-FC coupling remained a significant predictor of general intelligence better than at chance (Pearson’s r\=0.11±0.04, p\=0.01, FDR corrected, Figure 4A). For fluid intelligence and crystal intelligence, the predictive performances of SC-FC coupling were not better than at chance (Figure 4A).”

      In a similar vein, in the text, you report Pearson's R for the predictive results but Figure 4A shows predictive accuracy - accuracy is a different (categorical) metric. It would be good to homogenise to clarify predictive results.

      We have made the corresponding changes in Figure 4.

      Author response image 6.

      Encoding individual differences in intelligence using regional SC-FC coupling. (A) Predictive accuracy of fluid, crystallized, and general intelligence composite scores. (B) Regional distribution of predictive weight. (C) Predictive contribution of functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict the 1.5× IQR from the first or third quartile.

      *Methods and QC:

      -Parcellations

      ** It would be useful to mention briefly how the BNA was applied to the data and if any quality checks were performed for the resulting parcellations, especially for the youngest subjects which might be most dissimilar to the population used to derive the atlas (healthy adults HCP subjects) ~ question of parcellation quality.

      We have added the description, please see lines 434-436: “The BNA[31] was projected on native space according to the official scripts (http://www.brainnetome.org/resource/) and the native BNA was checked by visual inspection.” 

      ** Additionally, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate. It might be useful to mention the above as limitations (which apply to most studies with similar focus).

      We have added your comment to the methodological issues, please see lines 378-379: “Third, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate.”

      - Tractography

      ** L432: it might be useful to name the method you used (probtrackx).

      We have added this name to the description, please see lines 455-456: “probabilistic tractography (probtrackx)[78, 79] was implemented in the FDT toolbox …”

      ** L434: 'dividing the total fibres number in source region' - dividing by what?

      We have revised the description, please see line 458: “dividing by the total fibres number in source region.”

      ** L436: 'connections in subcortical areas were removed' - why did you trace connections to subcortical areas in the first place if you then removed them (to match with cortical MPC areas I suspect)? Or do you mean there were spurious streamlines through subcortical regions that you filtered?

      On the one hand we need to match the MPC, and on the other hand, as we stated in methodological issues, the challenge of accurately resolving the connections of small structures within subcortical regions using whole-brain diffusion imaging and tractography techniques[10, 11]. 

      ** Following on the above, did you use any exclusion masks during the tracing? In general, more information about quality checks for the tractography would be useful. For example, L437: did you do any quality evaluations based on the removed spurious streamlines? For example, were there any trends between spurious streamlines and the age of the subject? Distance between regions/size of the regions?

      We did not use any exclusion masks. We performed visual inspection for the tractography quality and did not assess the relationship between spurious streamlines and age or distance between regions/size of the regions.

      ** L439: 'weighted probabilistic network' - this was weighted by the filtered connectivity densities or something else?

      The probabilistic network is weighted by the filtered connectivity densities.

      ** I appreciate the short description of the communication models in Text S1, it is very useful.

      Thank you for your comment.

      ** In addition to limitations mentioned in L368 - during reconstruction, have you noticed problems resolving short inter-hemispheric connections?

      We have not considered this issue, we have added it to the limitation, please see lines 383-384: “In addition, the reconstruction of short connections between hemispheres is a notable challenge.”

      - Functional analysis:

      ** There is a difference in acquisition times between participants below and above 8 years (21 vs 26 min), does the different length of acquisition affect the quality of the processed data?

      We have made relatively strict quality control to ensure the quality of the processed data.  

      ** L446 'regressed out nuisance variables' - it would be informative to describe in more detail what you used to perform this.

      We have provided more detail about the regression of nuisance variables, please see lines 476-477: “The nuisance variables were removed from time series based on general linear model.”

      ** L450-452: it would be useful to add the number of excluded participants to get an intuition for the overall quality of the functional data. Have you checked if the quality is associated with the age of the participant (which might be related to motion etc). Adding a distribution of remaining frames across participants (vs age) would be useful to see in the supplementary methods to better understand the data you are using.

      We have supplemented the exclusion information of the subjects during the data processing, and the distribution and aged correlation of motion and remaining frames. Please see lines 481-485: “Quality control. The exclusion of participants in the whole multimodal data processing pipeline was depicted in Figure S13. In the context of fMRI data, we computed Pearson’s correlation between motion and age, as well as between the number of remaining frames and age, for the included participants aged 5 to 22 years and 8 to 22 years, respectively. These correlations were presented in Figure S14.”

      Author response image 7.

      Exclusion of participants in the whole multimodal data processing pipeline.  

      Author response image 8.

      Figure S14. Correlations between motion and age and number of remaining frames and age.

      ** L454: 'Pearson's correlation's... ' In contrast to MPC you did not remove negative correlations in the functional matrices. Why this choice?

      Whether the negative correlation connection of functional signal is removed or not has always been a controversial issue. Referring to previous studies of SC-FC coupling[12-14], we find that the practice of retaining negative correlation connections has been widely used. In order to retain more information, we chose this strategy. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach.

      - Gene expression:

      ** L635, you focus on the left cortex, is this common? Do you expect the gene expression to be fully symmetric (given reported functional hemispheric asymmetries)? It might be good to expand on the reasoning.

      An important consideration regarding sample assignment arises from the fact that only two out of six brains were sampled from both hemispheres and four brains have samples collected only in the left. This sparse sampling should be carefully considered when combining data across donors[1]. We have supplemented the description, please see lines 569-571: “Restricting analyses to the left hemisphere will minimize variability across regions (and hemispheres) in terms of the number of samples available[40].”

      ** Paragraph of L537: you use evolution of coupling with age (correlation) and compare to gene expression with adults (cohort of Allen Human Brain Atlas - no temporal evolution to the gene expressions) and on L369 you mention that 'relative spatial patterns of gene expressions remain stable after birth'. Of course this is not a place to question previous studies, but would you really expect the gene expression associated with the temporary processes to remain stable throughout the development? For example, myelination would follow different spatiotemporal gradient across brain regions, is it reasonable to expect that the expression patterns remain the same? How do you then interpret a changing measure of coupling (correlation with age) with a gene expression assessed statically?

      We agree with your comment that the spatial expression patterns is expected to vary at different periods. We have revised the previous description, please see lines 383-386: “Fifth, it is important to acknowledge that changes in gene expression levels during development may introduce bias in the results.”

      - Reproducibility analyses:

      ** Paragraph L576: are we to understand that you performed the entire pipeline 3 times (WD, S1, S2) for both parcellations schemes and tractography methods (~12 times) including the selection of communication models and you always got the same best three communication models and gene expression etc? Or did you make some design choices (i.e. selection of communication models) only on a specific set-up and transfer to other settings?

      The choice of communication model is established at the beginning, which we have clarified in the article, please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” For reproducibility analyses (parcellation, tractography, and split-half validation), we fixed other settings and only assessed the impact of a single factor.

      ** Paragraph of L241: I really appreciate you evaluated the robustness of your results to different tractography strategies. It is reassuring to see the similarity in results for the two approaches. Did you notice any age-related effects on tractography quality for the two methods given the wide age range (did you check?)

      In our study, the tractography quality was checked by visual inspection. Using quantifiable tools to tractography quality in future studies could answer this question objectively.

      ** Additionally, I wonder how much of that overlap is driven by the changes in MPC which is the same between the two methods... especially given its high weight in the SC-FC coupling you reported earlier in the paper. It might be informative to directly compare the connectivity matrices derived from the two tracto methods directly. Generally, as mentioned in the previous comments, I think it would be interesting to assess coupling using different input settings (with WM structural and MPC separate and then combined).

      As your previous comment, we have examined the coupling patterns, coupling differences, coupling age correlation, and spatial correlations between the patterns based on different models, as shown in Figure S2. Please see our response to the previous comment for details.

      ** L251 - I also wonder if the random splitting is best adapted to validation in your case given you study relationships with age. Would it make more sense to make stratified splits to ensure a 'similar age coverage' across splits?

      In our study, we adopt the random splitting process which repeated 1,000 times to minimize bias due to data partitioning. The stratification you mentioned is a reasonable method, and keeping the age distribution even will lead to higher verification similarity than our validation method. However, from the validation results of our method, the similarity is sufficient to explain the generalization of our findings.

      Minor comments

      L42: 'is regulated by genes'

      ** Coupling (if having a functional role and being regulated at all) is possibly resulting from a complex interplay of different factors in addition to genes, for example, learning/environment, it might be more cautious to use 'regulated in part by genes' or similar.

      We have corrected it, please see line 42.

      L43 (and also L377): 'development of SC-FC coupling'

      ** I know this is very nitpicky and depends on your opinion about the nature of SC-FC coupling, but 'development of SC-FC coupling' gives an impression of something maturing that has a role 'in itself' (for example development of eye from neuroepithelium to mature organ etc.). For now, I am not sure it is fully certain that SC-FC coupling is more than a byproduct of the comparison between SC and FC, using 'changes in SC-FC coupling with development' might be more apt.

      We have corrected it, please see lines 43-44.

      L261 'SC-FC coupling was stronger ... [] ... and followed fundamental properties of cortical organization.' vs L168 'No significant correlations were found between developmental changes in SC-FC coupling and the fundamental properties of cortical organization'.

      **Which one is it? I think in the first you refer to mean coupling over all infants and in the second about correlation with age. How do you interpret the difference?

      Between the ages of 5 and 22 years, we found that the mean SC-FC coupling pattern has become similar to that of adults, consistent with the fundamental properties of cortical organization. However, the developmental changes in SC-FC coupling are heterogeneous and sequential and do not follow the mean coupling pattern to change in the same magnitude.

      L277: 'temporal and spatial complexity'

      ** Additionally, communication models have different assumptions about the flow within the structural network and will have different biological plausibility (they will be more or less

      'realistic').

      Here temporal and spatial complexity is from a computational point of view.

      L283: 'We excluded a centralized model (shortest paths), which was not biologically plausible' ** But in Text S1 and Table S1 you specify the shortest paths models. Does this mean you computed them but did not incorporate them in the final coupling computations even if they were predictive?

      ** Generally, I find the selection of the final 3 communication models confusing. It would be very useful if you could clarify this further, for example in the methods section.

      We used all twenty-seven communication models (including shortest paths) to predict FC at the node level for each participant. Then we identified three communication models that can significantly predict FC. For the shortest path, he was excluded because he did not meet the significance criteria. We have further added methodological details to this section, please see lines 503-507.

      L332 'As we observed increasing coupling in these [frontoparietal network and default mode network] networks, this may have contributed to the improvements in general intelligence, highlighting the flexible and integrated role of these networks' vs L293 'SC-FC coupling in association areas, which have lower structural connectivity, was lower than that in sensory areas. This configuration effectively releases the association cortex from strong structural constraints imposed by early activity cascades, promoting higher cognitive functions that transcend simple sensori-motor exchanges'

      ** I am not sure I follow the reasoning. Could you expand on why it would be the decoupling promoting the cognitive function in one case (association areas generally), but on the reverse the increased coupling in frontoparietal promoting the cognition in the other (specifically frontoparietal)?

      We tried to explain the problem, for general intelligence, increased coupling in frontoparietal could allow more effective information integration enable efficient collaboration between different cognitive processes.

      * Formatting errors etc.

      L52: maybe rephrase?

      We have rephrased, please see lines 51-53: “The T1- to T2-weighted (T1w/T2w) ratio of MRI has been proposed as a means of quantifying microstructure profile covariance (MPC), which reflects a simplified recapitulation in cellular changes across intracortical laminar structure[6, 1215].”

      L68: specialization1,[20].

      We have corrected it.

      L167: 'networks significantly increased with age and exhibited greater increased' - needs rephrasing.

      We have corrected it.

      L194: 'networks were significantly predicted the general intelligence' - needs rephrasing.

      We have corrected it, please see lines 204-205: “we found that the weights of frontoparietal and default mode networks significantly contributed to the prediction of the general intelligence.”

      L447: 'and temporal bandpass filtering' - there is a verb missing.

      We have corrected it, please see line 471: “executed temporal bandpass filtering.”

      L448: 'greater than 0.15' - unit missing.

      We have corrected it, please see line 472: “greater than 0.15 mm”.

      L452: 'After censoring, regression of nuisance variables, and temporal bandpass filtering,' - no need to repeat the steps as you mentioned them 3 sentences earlier.

      We have removed it.

      L458-459: sorry I find this description slightly confusing. What do you mean by 'modal'? Connectional -> connectivity profile. The whole thing could be simplified, if I understand correctly your vector of independent variables is a set of wm and microstructural 'connectivity' of the given node... if this is not the case, please make it clearer.

      We have corrected it, please see line 488: “where 𝒔𝑖 is the 𝑖th SC profiles, 𝑛 is the number of SC profiles”.

      L479: 'values and system-specific of 480 coupling'.

      We have corrected it.

      L500: 'regular' - regularisation.

      We have changed it to “regularization”.

      L567: Do you mean that in contrast to probabilistic with FSL you use deterministic methods within Camino? For L570, you introduce communication models through 'such as': did you fit all models like before? If not, it might be clearer to just list the ones you estimated rather than introduce through 'such as'.

      We have changed the description to avoid ambiguity, please see lines 608-609: “We then calculated the communication properties of the WMC including communicability, mean first passage times of random walkers, and flow graphs (timescales=1).”

      Citation [12], it is unusual to include competing interests in the citation, moreover, Dr. Bullmore mentioned is not in the authors' list - this is most likely an error with citation import, it would be good to double-check.

      We have corrected it.

      L590: Python scripts used to perform PLS regression can 591 be found at https://scikitlearn.org/. The link leads to general documentation for sklearn.

      We have corrected it, please see lines 627-630: “Python scripts used to perform PLS regression can be found at https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cro ss_decomposition.PLSRegression.”

      P26 and 27 - there are two related sections: Data and code availability and Code availability - it might be worth merging into one section if possible.

      We have corrected it, please see lines 623-633.

      References

      (1) Arnatkeviciute A, Fulcher BD, Fornito A. A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage. 2019;189:353-67. Epub 2019/01/17. doi: 10.1016/j.neuroimage.2019.01.011. PubMed PMID: 30648605.

      (2) Zhong S, He Y, Gong G. Convergence and divergence across construction methods for human brain white matter networks: an assessment based on individual differences. Hum Brain Mapp. 2015;36(5):1995-2013. Epub 2015/02/03. doi: 10.1002/hbm.22751. PubMed PMID: 25641208; PubMed Central PMCID: PMCPMC6869604.

      (3) Waehnert MD, Dinse J, Weiss M, Streicher MN, Waehnert P, Geyer S, et al. Anatomically motivated modeling of cortical laminae. Neuroimage. 2014;93 Pt 2:210-20. Epub 2013/04/23. doi: 10.1016/j.neuroimage.2013.03.078. PubMed PMID: 23603284.

      (4) Paquola C, Vos De Wael R, Wagstyl K, Bethlehem RAI, Hong SJ, Seidlitz J, et al. Microstructural and functional gradients are increasingly dissociated in transmodal cortices. PLoS Biol. 2019;17(5):e3000284. Epub 2019/05/21. doi: 10.1371/journal.pbio.3000284. PubMed PMID: 31107870.

      (5) Haufe S, Meinecke F, Gorgen K, Dahne S, Haynes JD, Blankertz B, et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage. 2014;87:96-110. Epub 2013/11/19. doi: 10.1016/j.neuroimage.2013.10.067. PubMed PMID: 24239590.

      (6) Demirtas M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, et al. Hierarchical Heterogeneity across Human Cortex Shapes Large-Scale Neural Dynamics. Neuron. 2019;101(6):1181-94 e13. Epub 2019/02/13. doi: 10.1016/j.neuron.2019.01.017. PubMed PMID: 30744986; PubMed Central PMCID: PMCPMC6447428.

      (7) Deco G, Kringelbach ML, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, et al. Dynamical consequences of regional heterogeneity in the brain's transcriptional landscape. Sci Adv. 2021;7(29). Epub 2021/07/16. doi: 10.1126/sciadv.abf4752. PubMed PMID: 34261652; PubMed Central PMCID: PMCPMC8279501.

      (8) Chen J, Tam A, Kebets V, Orban C, Ooi LQR, Asplund CL, et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat Commun. 2022;13(1):2217. Epub 2022/04/27. doi: 10.1038/s41467-022-29766-8. PubMed PMID: 35468875; PubMed Central PMCID: PMCPMC9038754.

      (9) Li J, Bzdok D, Chen J, Tam A, Ooi LQR, Holmes AJ, et al. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci Adv. 2022;8(11):eabj1812. Epub 2022/03/17. doi: 10.1126/sciadv.abj1812. PubMed PMID: 35294251; PubMed Central PMCID: PMCPMC8926333.

      (10) Thomas C, Ye FQ, Irfanoglu MO, Modi P, Saleem KS, Leopold DA, et al. Anatomical accuracy of brain connections derived from diffusion MRI tractography is inherently limited. Proc Natl Acad Sci U S A. 2014;111(46):16574-9. Epub 2014/11/05. doi: 10.1073/pnas.1405672111. PubMed PMID: 25368179; PubMed Central PMCID: PMCPMC4246325.

      (11) Reveley C, Seth AK, Pierpaoli C, Silva AC, Yu D, Saunders RC, et al. Superficial white matter fiber systems impede detection of long-range cortical connections in diffusion MR tractography. Proc Natl Acad Sci U S A. 2015;112(21):E2820-8. Epub 2015/05/13. doi: 10.1073/pnas.1418198112. PubMed PMID: 25964365; PubMed Central PMCID: PMCPMC4450402.

      (12) Gu Z, Jamison KW, Sabuncu MR, Kuceyeski A. Heritability and interindividual variability of regional structure-function coupling. Nat Commun. 2021;12(1):4894. Epub 2021/08/14. doi: 10.1038/s41467-021-25184-4. PubMed PMID: 34385454; PubMed Central PMCID: PMCPMC8361191.

      (13) Liu ZQ, Vazquez-Rodriguez B, Spreng RN, Bernhardt BC, Betzel RF, Misic B. Time-resolved structure-function coupling in brain networks. Commun Biol. 2022;5(1):532. Epub 2022/06/03. doi: 10.1038/s42003-022-03466-x. PubMed PMID: 35654886; PubMed Central PMCID: PMCPMC9163085.

      (14) Zamani Esfahlani F, Faskowitz J, Slack J, Misic B, Betzel RF. Local structure-function relationships in human brain networks across the lifespan. Nat Commun. 2022;13(1):2053. Epub 2022/04/21. doi: 10.1038/s41467-022-29770-y. PubMed PMID: 35440659; PubMed Central PMCID: PMCPMC9018911.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programming library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs.

      Weaknesses:

      Although the Vermouth library appears promising as a general tool for topology generation, there is insufficient information in the current manuscript and a lack of documentation that may allow users to easily apply this library. More detailed explanation of various classes such as Processor, Molecule, Mapping, ForceField etc. that are mentioned is still needed, including inputs, output and associated operations of these classes. Some simple demonstration of application of these classes would be of great help to users. The formats of internal databases used to describe reference structures and force fields may also need to be clarified. This is particularly important when the Vermouth needs to be adapted for other AA/CG force fields and other MD engines.

      We thank the reviewer for pointing out the strengths of the presented work and agree that one of the current limitations is the lack of documentation about the library. In the revision, we point more clearly to the documentation page of the Vermouth library, which contains more detailed information on the various processors. The format of the internal databases has also been added to the documentation page. Providing a simple demonstration of applications of these classes is a great suggestion, however, we believe that it is more convenient to provide those in the form of code examples in the documentation or for instance jupyter notebooks rather than in the paper itself.  

      The successful automation of the Vermouth relies on the reference structures that need to be pre-determined. In case of the study of 43 small ligands, the reference structures and corresponding mapping to MARTINIcompatible representations for all these ligands have been already defined in the M3 force field and added into the Vermouth library. However, the authors need to comment on the scenario where significantly more ligands need to be considered and other force fields need to be used as CG representations with a lack of reference structures and mapping schemes.

      We acknowledge that vermouth/martinize2 is not capable of automatically generating Martini mappings or parameters on the fly for unknown structures that are not part of the database. However, this capability is not the purpose of the program, which is rather to distribute and manage existing parameters. Unlike atomistic force fields, which frequently have automated topology builders, Martini parameters are usually obtained for a set of specific molecules at a time and benchmarked accordingly. As more parameters are obtained by researchers, they can be added to the vermouth library via the GitHub interface in a controlled manner. This process allows the database to grow and in our opinion will quickly grow beyond the currently implemented parameters. Furthermore, the API of Vermouth is set up in a way that it can easily interface with automated topology builders which are currently being developed. Hence this limitation in our view does not diminish the applicability of vermouth to high-throughput applications with many ligands. The framework is existing and works, now only more parameters have to be added.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Kroon, Grunewald, Marrink and coworkers present the development of Vermouth library for coarse grain assignment and parameterization and an updated version of python script, the Martinize2 program, to build Martini coarse grained (CG) models, primarily for protein systems.

      Strengths:

      In contrast to many mature and widely used tools to build all-atom (AA) models, there are few well-accepted programs for CG model constructions and parameterization. The research reported in this manuscript is among the ongoing efforts to build such tools for Martini CG modeling, with a clear goal of high-throughput simulations of complex biomolecular systems and, ultimately, whole-cell simulations. Thus, this manuscript targets a practical problem in computational biophysics. The authors see such an effort to unify operations like CG mapping, parameterization, etc. as a vital step from the software engineering perspective.

      Weaknesses:

      However, the manuscript in this shape is unclear in the scientific novelty and appears incremental upon existing methods and tools. The only "validation" (more like an example application) is to create Martini models with two protein structure sets (I-TASSER and AlphaFold). The success rate in building the models was only 73%, while the significant failure is due to incomplete AA coordinates. This suggests a dependence on the input AA models, which makes the results less attractive for high-throughput applications (for example, preparation/creation of the AA models can become the bottleneck). There seems to be an improvement in considering the protonation state and chemical modification, but convincing validation is still needed. Besides, limitations in the existing Martini models remain (like the restricted dynamics due to the elastic network, the electrostatic interactions or polarizability).

      We thank the reviewer for pointing out the strengths of the presented work, but respectfully disagree with the criticism that the presented work is only incremental upon existing methods and tools. All MD simulations of structured proteins regardless of the force field or resolution rely on a decent initial structure to produce valid results. Therefore, failure upon detection of malformed protein input structures is an essential feature for any high-throughput pipeline working with proteins, especially considering the computational cost of MD simulations. We note that programs such as the first version of Martinize generate reasonable-looking input parameters that lead to unphysical simulations and wasted CPU hours.

      The alpha-fold database for which we surveyed 200,000 structures only contained 7 problematic structures, which means that the success rate was 99% for this database. This example simply shows that users potentially have to add the step of fixing atomistic protein input structures, if they seek to run a high-throughput pipeline.

      But at least they can be assured that martinize2 will make sure to check that no issues persist.

      Furthermore, we note that the manuscript does not aim to validate or improve the existing Martini (protein) models. All example cases presented in the paper are subject to the limitations of the protein models for the reason that martinize2 is only the program to generate those parameters. Future improvements in the protein model, which are currently underway, will immediately be available through the program to the broader community.  

      Reviewer #3 (Public Review):

      Summary:

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications.

      Strengths:

      A large scale protein simulation was attempted, showing strong evidence that authors' algorithms work smoothly.

      The authors described the algorithms in detail and shared the open-source code under Apache 2.0 license on GitHub. This allows both reproducibility of extended usefulness within the field. These algorithms are potentially impactful if the authors can address some of the issues listed below.

      We thank the reviewer for pointing out the strengths.  

      Weaknesses:

      One major caveat of the manuscript is that the authors claim their algorithms aim to "process any type of molecule or polymer, be it linear, cyclic, branched, or dendrimeric, and mixtures thereof" and "enable researchers to prepare simulation input files for arbitrary (bio)polymers". However, the examples provided by the manuscript only support one type of biopolymer, i.e. proteins. Despite the authors' recommendation of using polyply along with martinize2/vermouth, no concrete evidence has been provided to support the authors' claim. Therefore, the manuscript must be modified to either remove these claims or include new evidence.

      We acknowledge that the current manuscript is largely protein-centric. To some extent this results from the legacy of martinize version 1, which was also only used for proteins. However, to show that martinize2 also works for cyclic as well as branched molecules we implemented two additional test cases and updated formerly Figure 6 and now Figure 7. Crown ether is used as an example of a cyclic molecule whereas a small branched polyethylene molecule is a test case for branching. Needless to say both molecules are neither proteins nor biomolecules. 

      Method descriptions on Martinize2 and graph algorithms in SI should be core content of the manuscript. I argue that Figure S1 and Figure S2 are more important than Figure 3 (protonation state). I recommend the authors can make a workflow chart combining Figure S1 and S2 to explain Martinize2 and graph algorithms in main text.

      The reviewer's critique is fair. Given the already rather large manuscript, we tried to strike a balance between describing benchmark test cases, some practical usage information (e.g. the Histidine modification), and the algorithmic library side of the program. In particular, we chose to add the figure on protonation state, because how to deal with protonation states—in particular, Histidines—was amongst the top three raised issues by users on our GitHub page. Due to this large community interest, we consider the figure equally important. However, we moved Figure S1 from the Supporting Information into the manuscript and annotated the already mentioned text with the corresponding panels to more clearly illustrate the underlying procedure. 

      In Figure 3 (protonation state), the figure itself and the captions are ambiguous about whether at the end the residue is simply renamed from HIS to HIP, or if hydrogen is removed from HIP to recover HIS.

      Using either of the two routes yields the same parameters in the end, which are for the protonated Histidine. In the second route, the extra hydrogen on Histidine is detected as an additional atom and therefore a different logic flow is triggered. Atoms are never removed, but only compounded to a base block plus modification atoms. We adjusted the figure caption to point this out more clearly.  

      In "Incorporating a Ligand small-molecule Database", the authors are calling for a community effort to build a small-molecule database. Some guidance on when the current database/algorithm combination does or does not work will help the community in contributing.

      Any small molecule not part of the database will not work. However, martinize2 will quickly identify if there are missing components of the system and alert the users. At that point, the users can decide to make their files, guided by the new documentation pages. 

      A speed comparison is needed to compare Martinize2 and Martinize.

      We respectfully disagree that a speed comparison is needed. We already alerted in the manuscript discussion that martinize2 is slower, since it does more checks, is more general, and does not only implement a single protein model.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This study investigated the role of CD47 and TSP1 in extramedullary erythropoiesis by utilization of both global CD47-/- mice and TSP1-/- mice. 

      Strengths:  

      Flow cytometry combined with spleen bulk and single-cell transcriptomics were employed. The authors found that stress-induced erythropoiesis markers were increased in CD47-/- spleen cells, particularly genes that are required for terminal erythroid differentiation. Moreover, CD47 dependent erythroid precursors population was identified by spleen scRNA sequencing. In contrast, the same cells were not detected in TSP1-/- spleen. These findings provide strong evidence to support the conclusion that the differential role of CD47 and TSP1 in extramedullary erythropoiesis in mouse spleen. 

      Weaknesses: 

      Methods and data analysis are appropriate. However, some clarifications are required. The discussion section needs to be expanded.  

      (1) The sex of mice that were used in the study is unknown.  

      (2) In the method of Single-cell RNA sequencing (page 10), it mentioned that single cell suspensions from mouse spleens were depleted of all mature hematopoietic cell lineages by passing through CD8a microbeads and CD8a+ T cell isolation Kit. As described, it is confusing what cell types are obtained for performing scRNAseq. More information is required for clarity.  

      (3) The constitutive CD47 knockout mouse model is utilized in this study. The observed accumulation of erythroid precursors in the spleens of CD47-/- mice suggests a chronic effect of CD47 on spleen function. Can the current findings be extrapolated to acute scenarios involving CD47 knockdown or loss, as this may have more direct relevance to the potential side effects associated with an-CD47-mediated cancer therapy? Please expand on this topic in the discussion section.  

      (1) The missing mouse gender information is incorporated into the revised manuscript. For flow cytometry, two male and two female mice of each genotype were used. For single cell RNA sequencing, two female and one male mouse of each genotype were used. For the bulk RNA sequencing four male cd47−/− mice and four male wildtype mice were used.

      (2) We apologize for the confusing presentation, which has been corrected. The bulk RNA sequencing analysis identified elevated expression of erythropoietic genes in CD8+ spleen cells from cd47−/− versus wildtype mice that were obtained using magnetic bead depletion of all other lineages. Therefore, we used the same Miltenyi negative selection kit as the first step to prepare the cells for single cell RNA sequencing. These untouched cells were then depleted of most mature CD8 T cells using a Miltenyi CD8a(Ly2) antibody positive selection kit. An important consideration underlying this approach was recognizing that the commercial magnetic bead depletion kits used for preparing specific immune cell types are optimized to give relatively pure populations of the intended immune cells using wildtype mice. Our previous experience studying NK cell development in the cd47−/− mice taught us that NK precursors, which are rare in wildtype mouse spleens, accumulate in cd47−/− spleens and were not removed by the antibody cocktail optimized for wildtype spleen cells (Nath et al Front Immunol 2018). The present data indicate that erythroid precursors behave similarly.

      (3) The Discussion was edited as recommended. Anemia is a prevalent side effect of several CD47 therapeutic antibodies being developed for cancer therapy. This anemia would be expected to induce erythropoiesis in bone marrow and possibly at extramedullary sites. Human spleen cells are not accessible to directly evaluate extramedullary erythropoiesis in cancer patients, but analysis of circulating erythroid precursors or liquid biopsy methods could be useful to detect induction of extramedullary erythropoiesis by these therapeutics. We are currently investigating the ability of CD47 antibodies to directly induce erythropoiesis using a human in vitro model.

      Reviewer #2 (Public Review):

      Summary: 

      The authors used existing mouse models to compare the effects of ablating the CD47 receptor and its signaling ligand Thrombospondin. The CD47-KO model used in this study was generated by Kim et al, 2018, where hemolytic anemia and splenomegaly was reported. This study analyzes the cell composition of the spleens from CD47-KO and Thsp-KO, focusing on early hematopoietic and erythroid populations. The data broadly shows that splenomegaly in the CD47-KO is largely due to an increase in committed erythroid progenitors as seen by Flow Cytometry and single-cell sequencing, whereas the Thsp-KO shows a slight depletion of committed erythroid progenitors but is otherwise similar to WT in splenic cell composition.  

      Strengths:

      The techniques used are appropriate for the study and the data support the main conclusions of the study. This study provides novel insights into a putative role of Thsp-CD47 signaling in triggering definitive erythropoiesis in the mouse spleen in response to anemic stress and constitutes a good resource for researchers seeking to understand extramedullary erythropoiesis.  

      Weaknesses:

      The Flow cytometry data alone supports the authors' main conclusion and single-cell sequencing confirms them but does not add further information, other than those already observed in the Flow data. The single-cell sequencing analysis and presentation could be improved by using alternate clustering methods as well as separating the data by genotype and displaying them in order for readers to fully grasp the nuanced differences in marker expression between the genotypes. Further, it is not clear from the authors' description of their results whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model. The enrichment of cKit+ Ter119+ Sca1- cells in CD47-KO indicates that these are likely stress erythroid progenitors. Another CD47-KO mouse model (Lindberg et al 1996) has no reported erythroid defects and was also not examined in this study.  

      (1) The reviewer asked, “whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model.” Our data supports both a direct role for CD47 and an indirect role resulting from the response to anemic stress. We cited our previous publications describing increased Sox2+ stem cells in spleens of Cd47 and Thbs1 knockout mice, but we neglected to emphasize another study where we found that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). Taken together, our published data demonstrates that loss of CD47 results in an intrinsic protection of hematopoietic stem cells from genotoxic stress. This function of CD47 is thrombospondin-1-dependent and is consistent with the up-regulation of early erythroid precursors in the spleens of both knockout mice but cannot explain why the Thbs1−/−  mice have fewer committed erythroid precursors than wildtype. We cited studies that documented increased red cell turnover in cd47−/− mice but less red cell turnover in Thbs1−/−  mice compared to wildtype mice. Increased red cell clearance in cd47−/− mice is mediated by loss of the “don’t eat me” function of CD47 on red cells. In wildtype mice, clearance is augmented by thrombospondin-1 binding to the clustered CD47 on aging red cells (Wang, Aging Cell 2020). Thus, anemic stress in the mouse strains studied here decreases in the order cd47−/− > WT > Thbs−/−. This is consistent with the increased committed erythroid progenitors reported here in cd47−/− spleens and decreased committed progenitors in the Thbs1−/− spleens. 

      (2) Based on the reviewer’s question regarding alternative mechanisms and the publication of Yang et al 2022 identifying a role for CD47 in stress erythropoiesis though transfer of mitochondria to erythroblasts, we asked whether cd47-/- erythroid precursors  would show decreased mRNA expression for mitochondrial chromosome genes (new Figure 4−figure supplement 3C). Some of these mRNAs were more abundant in cd47-/- and thbs1-/- erythroid cells, which is the opposite of what we expected based on Yang 2022 but consistent with our previous publications identifying thrombospondin-1 and CD47 as negative regulators of mitochondrial homeostasis in muscle cells and T cells.

      (3) The cd47−/− mice used for the current study are the same strain as those reported by Lindberg et al in 1996, with additional backcrossing onto a C57BL/6 background.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.  

      Significant efforts went into analyzing the type of erythroid progenitors by marker expression, but typical Flow cytometry strategies using Ter119 and CD44 combined with forward scatter can be used to stage the committed erythroid progenitors precisely.  

      We appreciate this suggestion to extend the flow data. However, the upcoming retirement of the PI required closing our breeding colony, and the mice are no longer available.  

      How can the difference between the erythroid phenotypes of the Lindberg et al 1996 CD47-KO (exon2 Neo knock-in) and Kim et al 2018 CD47-ko (exon1 26bp indel) be explained?  

      We are not convinced that the erythroid phenotypes of the Lindberg and Kim CD47-KO mice differ at the age used in our studies. Kim et al. focused on progressive hemolytic anemia and changes in T cells in spleen that emerge at 26 weeks age, whereas the mice used here were younger. The Lindberg and Kim mice have similar spleen enlargement at the age we used.

      Another manuscript under review from our lab suggests that cis-regulation of an adjacent colinear gene could contribute to some phenotypes observed when perturbing the Cd47 gene. The Lindberg mouse exhibits minimal perturbation of that adjacent gene, but we have no data regarding the Kim et al mouse. The reviewer’s question brought to our attention that we neglected to state in the Methods that the mice used here are the Lindberg mice, not the Kim mice. This omission is now corrected.

      The authors used Lindberg mouse for 2018 study on NK cells and observed splenomegaly. Did they check for extramedullary erythropoiesis there?  

      Retrospective examination of the RNAseq data for the spleen cells enriched in NK precursors used in our 2018 publication (Nath, 2018) reveals significantly elevated expression for a majority of the extramedullary erythroid markers listed in Table 1, but they were generally less abundant than observed for the lineage-depleted spleen cells used in the present manuscript.   

      Author response table 1.

      To clarify the stress erythropoiesis issue, it might be helpful to examine the sc-seq data for the expression of specific stress erythropoiesis markers in CD47-KO. Targets of BMP4 and Hedgehog signaling can also be examined. Further colony assays can help determine if stress BFU-Es are prevalent in the CD47-KO spleens and depleted in Thsp-KO  

      As noted in Table 1, twelve of the genes we studied are established markers of stress-induced extramedullary erythropoiesis, and most of these were included in the scRNA seq data presented. Our previous publication demonstrated that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). We have not performed colony formation assays using spleen.

      To address the reviewer’s question regarding BMP4 and hedgehog signaling we performed gene set enrichment analysis for known BMP4 and hedgehog signaling signatures. Using GSE26351_UNSTIM_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS, cd47-/- cells in cluster 12 or their CD34+ orCD34- subsets did not show significant enrichment for BMP4 targets compared to WT. Thbs1-/- cells in clusters 12 and 14 showed marginally significant depletion of the BMP4 signature (p=0.04 and p=0.023, respectively). Using the KEGG_HEDGEHOG_SIGNALING_PATHWAY, we did not find any significant enrichment. However, only a few genes in this pathway were detectable in the scRNAseq data. These data suggest that the BMP4 signaling may be regulated by thrombospondin-1, but properly testing this hypothesis would require achieving greater sequencing depth combined with a cell isolation method that better enriches the early hematopoietic progenitors that are known to utilize the BMP4 pathway.

      In the reclustering of erythroid progenitors in Figure 5, inclusion of Gata1 as a selection marker may help capture more of the early erythroid progenitors from the dataset and provide a more complete picture of the erythroid populations. 

      We thank the reviewer for suggesting inclusion of Gata1. We repeated the reclustering including Gata1 and found the selected cell count increased from 876 cells to 1007 cells. However, most of the increase was not in the erythroid cluster, which increased from 413 cells to 419 cells. Most of the increase represented Gata1+ T cells (548 cells including Gata1 versus 463 cells without). The revised manuscript presents genotype-dependent differential gene expression based on including Gata1 selection, but none of the specific conclusions were changed from the initial submission. The new Table 4 and Figure 7−figure supplement 1 enabled us to compare differential expression of erythropoietic genes obtained using supervised and unsupervised clustering and show that both methods yield comparable results.

      Just out of curiosity, was there an attempt to make a CD47 Thsp double KO? . Is it viable?  

      Cd47 KO mice are somewhat difficult breeders, and several previous attempts to cross with other transgenics have produced viable homozygous offspring that could not be propagated.

      Recommendations for improving the wring and presentation.  

      Perhaps readers would find it more intriguing if the paper led with the single-cell sequencing showing enrichment of erythroid populations in CD47-KO, and later confirmed with Flow Cytometry (even if this was not necessarily the order in which the experiments were done). 

      We considered this suggestion but believe that some of the flow cytometry data is needed to understand why we focused on CD34+ and CD34- subsets and proliferation markers when analyzing the scRNAseq data

      The single-cell sequencing data in Figure 3 might benefit from UMAP clustering as well. In addition, it would greatly help readers if the data points were separated by genotype and displayed after clustering. A similar analysis has been done in this paper: doi:10.1038/s41556-022-00898-9 by clustering different conditions together but displaying them separately by condition. 

      We initially explored tSNE and UMAP clustering and obtained similar results. We have added violin plots separated by genotype in Figure 4-figure supplement 2. We also included improved clusters separated by genotype in the revised Figure 3 panels C and D and for the reclustering in Figure 6D. UMAP plots provided better presentation for the reclustering (revised Figure 7). All data have been updated to the latest pipeline as noted in the Methods.

      Minor corrections to the text and figures.  

      Figure 4: Labels and plot legends are illegible in general, please relabel manually and if possible, redo plots with bigger font size and legends (relatively easy using ggplot2) 

      All figure panels were relabeled using larger fonts

      Figure 5D: Individual plots are stacked randomly atop each other and in many cases, gene names are not visible. Please restack the layers and ensure that the gene names are visible 

      Panel D was made a separate figure with enlarged labels (now Figure 7).

      Supp Fig 2: Layout can be organized a little better. Consider splitting into two figures for better organization  

      The figure was split as recommended. Now Figure 1-figure supplement 2 and Figure 2-figure supplement

      1.

      Abstract Line 10: "...mRNA expression of Kit, Ermap, and Tfrc, Induction of committed erythroid precursors is...". Replace comma after "Tfrc" with period   

      Done.

      Discussion Page 9 Line 8: "...WT spleens, s. mRNAs for some markers of committed erythroid cells including Nr3c1 mRNA...". Remove ", s" after spleens.   

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the mechanism to promote distant metastasis in breast cancer. The evidence supporting the claims of the authors is convincing. The work will be of interest to medical biologists working on breast cancer.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths

      The paper has shown the expression of RGS10 is related to the molecular subtype, distant metastasis, and survival status of breast cancer. The study utilizes bioinformatic analyses, human tissue samples, and in vitro and in vivo experiments which strengthen the data. RGS10 was validated to inhibit EMT through a novel mechanism dependent on LCN2 and miR-539-5p, thereby reducing cancer cell proliferation, colony formation, invasion, and migration. The study elaborated the function of RGS10 in influencing the prognosis and biological behavior which could be considered as a potential drug target in breast cancer.

      Weakness

      The mechanism by which the miR-539-5p/RGS10/LCN2 axis may be related to the prognosis of cancer patients still needs to be elucidated. In addition, the sample size used is relatively limited. Especially, if further exploration of the related pathways and mechanisms of LCN2 can be carried out by using organoid models, as well as the potential of RGS10 as a biomarker for further clinical translation to verify its therapeutic target effect, which will make the data more convincing.

      Answer: Thank you for your comments and suggestions. In future research, we will utilize large clinical cohorts and organoid models to further explore relevant research mechanisms.

      Reviewer #2 (Public Review):

      Liu et al., by focusing on the regulation of G protein-signaling 10 (RGS10), reported that RGS10 expression was significantly lower in patients with breast cancer, compared with normal adjacent tissue. Genetic inhibition of RGS10 caused epithelial-mesenchymal transition, and enhanced cell proliferation, migration, and invasion, respectively. These results suggest an inhibitory role of RGS10 in tumor metastasis. Furthermore, bioinformatic analyses determined signaling cascades for RGS10-mediated breast cancer distant metastasis. More importantly, both in vitro and in vivo studies evidenced that alteration of RGS10 expression by modulating its upstream regulator miR-539-5p affects breast cancer metastasis. Altogether, these findings provide insight into the pathogenesis of breast tumors and hence identify potential therapeutic targets in breast cancer.

      The conclusions of this study are mostly well supported by data. However, there is a weakness in the study that needs to be clarified.

      In Figure 2A, although some references supported that SKBR3 and MCF-7 possess poorly aggressive and less invasive abilities, examining only RGS10 expression in those cells, it could not be concluded that 'RGS10 acts as a tumor suppressor in breast cancer'. It would be better to introduce a horizontal comparison of the invasive ability of these 3 types of cells using an invasion assay.

      Answer: Thank you for your comments and suggestions. MDA-MB-231, SKBR3, and MCF-7 originate from triple-negative breast cancer (high invasiveness), Her-2 receptor overexpression (relatively weak invasiveness), and luminal type breast cancer (relatively weak invasiveness) separately. Previous studies have demonstrated the invasive ability of these 3 types of cells. (PMID: 34390568)

      Reviewer #3 (Public Review):

      Distant metastasis is the major cause of death in patients with breast cancer. In this manuscript, Liu et al. show that RGS10 deficiency elicits distant metastasis via epithelial-mesenchymal transition in breast cancer. As a prognostic indicator of breast cancer, RGS10 regulates the progress of breast cancer and affects tumor phenotypes such as epithelial-mesenchymal transformation, invasion, and migration. The conclusions of this paper are mostly well supported by data, but some analyses need to be clarified.

      (1) Because diverse biomarkers have been identified for EMT, it is recommended to declare the advantages of using RGS10 as an EMT marker.

      Answer: Thank you for your comments. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. (PMID: 26293348). Previous studies have shown that RGS10 knocking down can lead to chemotherapy resistance of ovarian cancer cells to paclitaxel, cisplatin, and vincristine. In colorectal tumors, the transcription of RGS10 is regulated by DNA methylation and histone deacetylation. As a key regulatory factor in the G protein signaling pathway, RGS 10 is involved in tumor development including survival, polarization, adhesion, chemotaxis, and differentiation, these hints suggest RGS10 might be a marker for EMT in breast cancer.

      (2) The authors utilized databases to study the upstream regulatory mechanisms of RSG10. It is recommended to clarify why the authors focused on miRNAs rather than other epigenetic modifications.

      Answer: Thank you for your comments. miRNAs are short-chain non-coding RNA molecules that bind to the target mRNA's 3 'untranslated region (3'UTR) to cause mRNA degradation or translation inhibition, thus regulating gene expression in cells. These small molecules play a crucial role in regulating the expression of cancer-related genes and can act as tumor promoters or tumor suppressors. To further improve the molecular mechanism of malignant biological behavior of breast cancer cells with RGS10, we verified that miR-539-5p might be the upstream regulation target of RGS10 through bioinformatics prediction and in-vitro experiments.

      (3) The role of miR-539-5p in breast cancer has been described in previous studies. Hence, it is recommended to provide detailed elaboration on how miR-539-5p regulates the expression of RSG10.

      Answer: Thank you for your comments. To verify the effect of miRNA-539-5p regulating the expression of RSG10, we transfected miR-539-5p mimic, miR-539-5p mimic NC, miR-539-5p inhibitor, miR-539-5p inhibitor NC in SKBR3 cells and MDA-MB-231 cells respectively, and verified the expression of RGS10 through RT-qPCR and Western blot experiments. The results showed that compared with the transfected miR-539-5p mimic NC or wild-type SKBR3 cells, RGS10 m RNA and protein levels were significantly reduced. On the contrary, after MDA-MB-231 cells were transfected with miR-539-5p inhibitor to inhibit the expression of miR-539-5p, RGS10 mRNA and protein levels in MDA-MB-231 cells were significantly increased (Fig. 3.4A-C, Fig. 3.5A-C). This indicates that miR-539-5p can target and regulate RGS10.

      (4) To enhance the clarity and interpretability of the Western blot results, it would be advisable to mark the specific kilodalton (kDa) values of the proteins.

      Answer: Thank you for your comments and suggestions. We have corrected to mark the specific kilodalton (kDa) values of the proteins in WB.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The function of RGS10 in breast cancer was identified in the paper. However, some major issues in this paper need to be specified:

      (1) From reading the introduction section and its references, RGS proteins participate in multiple essential cellular processes and may be tumor initiators or suppressors (Li et al., 2023). This article focuses on the significance of RGS10 in breast cancer, it is recommended to show how the function of RGS10 exhibits therapeutic significance in other types of cancer.

      Answer: Thanks for your comments and suggestions on our findings. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. Especially in ovarian cancer cells. (PMID: 26293348). It has been found that the RGS10 expression is lower than that of normal ovarian cells. (PMID: 21044322). In addition, it has been found that knocking down RGS10 can enhance the vitality of ovarian cancer cells and promote chemoresistance by activating the Rheb GTP/mTOR signaling pathway. (PMID: 26319900). A study suggests that RGS10 mediates inflammation signaling regulation in SKOV-3 ovarian cancer cells with high expression of TNF and COX-2 after RGS10 knockdown. In colorectal tumors, RGS10 transcription is regulated by DNA methylation and histone deacetylation. (PMID: 35810565). RGS10 expression also are associated with poor prognosis in laryngeal cancer, hepatocellular carcinoma, and pediatric acute myeloid leukemia. (PMID: 32776811, PMID: 26516143, PMID: 30538250)

      (2) The authors characterize RGS10 protein expression in the breast cancer cell lines MDA-MB-231, MCF7, and SKBR3 in vitro Figure 2A. However, more information would strengthen the data - e.g. information on the expression of RGS10 protein and the survival in public databases, as well as the correlation between RGS10 and Her-2 expression.

      Answer: Thanks for your comments. we have checked the correlation of RGS10 expression and survival rate of Her-2 positive breast cancer patients in a public database. Although there is no significant difference in the “p” value, however, RGS10 high-expression patients have a favorable prognosis tendency than RGS10 low-expression patients after the 100th month.

      Author response image 1.

      (3) Regarding the current situation of clinical trials in the RGS family, the potential to develop RGS 10 for clinic translation is a driving factor for EMT.

      Answer: Thank you for your comments. The RGS (G protein signal transduction regulator) gene family provides an important "braking" function for the cell receptor family of G-protein coupled receptors (GPCR). GPCR controls hundreds of important functions in systemic cells and is the largest class of drug targets, with over one-third of FDA approved drugs treating diseases by binding to GPCR and altering its activity. When GPCRs are activated by hormones or neurotransmitters, they initiate signaling cascades within host cells through signal-carrying proteins called G proteins. The function of the RGS protein is to inactivate the G protein, thereby shutting down this signaling cascade reaction, which limits G protein signal transduction and allows cells to reset and receive new incoming signals. If it were not for it, the signals triggered by GPCR would inappropriately remain on, and the signal transduction would experience dysfunction (PMID: 33007266). The potential to develop RGS10 as a driving factor of EMT is meaningful for clinic translation.

      (4) In Figure 3A, the paper showed that differential gene expression revealed 70 genes were significantly upregulated in RGS10-depleted SKBR3 cells, The authors didn't show any data on the expression of other EMT-related proteins in pathway analysis.

      Answer: Thank you for your comments. The enrichment analysis of RNA sequencing in RGS10-depleted SKBR3 cells suggests that high correlation factors that are associated with EMT, such as TAGLN, TNFSF10, NDUFA4L2, CCN5, PHGDH, ST3GAL5, ANG, and LCN2.

      (5) In Figure 3B, the paper focuses on LCN2 in pathway analysis, however, the author did not elaborate on the significance of LCN2-related pathways in EMT.

      Answer: Thank you for your comments. Some studies have the significance of LCN2-related pathways in EMT. It was confirmed that LCN2 upregulation triggered by PTEN insufficiency induces EMT to promote migration and invasion in MCF7 cells (PMID: 27466505). The activation of STAT3 contributes to an increase in LCN2 expression, which activates ERK pathway-dependent EMT, thus promoting lung metastasis in MDA-MB-231 cells in breast cancer (PMID: 33473115). The silencing of LCN2 reduced the ability of migration and invasion of SUM149 cells and the proportion of tumor stem cells, suggesting that LCN2 may mediate the invasion and metastasis of cancer cells by regulating the stemness of breast cancer cells. The biological effects of LCN2 small molecule inhibitors ZINC00640089 and ZINC00784494 targeting IBC cells have been confirmed. The siRNA-mediated silencing of LCN2 in IBC cells significantly reduces cell proliferation, viability, migration, and invasion. (PMID: 34445288).

      (6) Minor: the author did not conduct a semi-quantitative analysis of the immunohistochemical results of RGS10.

      Answer: Thank you for your suggestion. We would like to demonstrate the qualitative analysis of RGS10 immunohistochemistry. The semi-quantitative analysis is not required in the paper.

      Reviewer #2 (Recommendations For The Authors):

      The role of RGS10 was well-characterized in this study, However, some minor points need to be modified.

      (1) Page 15 line 296, description of cell proliferation was missing, please modify.

      Answer: Thank you for your comments. We have corrected the description of cell proliferation on Page 15 highlighted in red.

      (2) In Figure 2C, the title of the Y-axis was missing.

      Answer: Thank you for your comments. We have corrected the description of the Y-axis title in Figure 2C.

      (3) Describe the transfection reagent that was used in this study, and incorporated into the methods section.

      Answer: Thank you for your comments. We have added the description of the transfection reagent to the methods section.

      (4) The manuscript needs proofreading.

      Answer: Thank you for your comments. We have proofread the manuscript.

    1. Author response:

      We would like to thank the reviewers for their constructive feedback. We have thoroughly considered their concerns and comments and we aim to include some additional results in an updated version of this manuscript. In addition, we would like to address some of the comments, with which we respectfully disagree. Below is our point-by-point reply.

      Reviewer 1:

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results.

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hs-FLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone).

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We will include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots, and we will comment on this in the revised manuscript.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UAS-Myc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but will report it in a way that captures the phenomenon in the revised manuscript.

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N).

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual.

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here.

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division.

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We will update all panels replacing Cas3 by Dcp-1.

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones.

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development?

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We will include this information in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition.

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We will provide those results in the updated manuscript.

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we would like to validate. We are currently performing immunostaining for Fmi in clones to confirm whether its levels change during competition. We will provide these results in the updated manuscript.

      Reviewer 3:

      Summary: <br /> In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #1 (Public Review):

      We thank Reviewer #1 for the professional evaluation and raising important points. We will address those comments in the updated manuscript and especially improve the discussion in respect to the two points of concern.

      (1) How can GlnA1 activity further be stimulated with further increasing 2-OG after the dodecamer is already fully assembled at 5 mM 2-OG.

      We assume a two-step requirement for 2-OG, the dodecameric assembly and the priming of the active sites. The assembly step is based on cooperative effects of 2-OG and does not require the presence of 2-OG in all 2-OG-binding pockets: 2-OG-binding to one binding pocket also causes a domino effect of conformational changes in the adjacent 2-OG-unbound subunit, as also described for Methanothermococcus thermolithotrophicus GS in Müller et al. 2023. Due to the introduction of these conformational changes, the dodecameric form becomes more favourable even without all 2-OG binding sites being occupied. With higher 2-OG concentrations present (> 5mM), the activity increased further until finally all 2-OG-binding pockets were occupied, resulting in the priming of all active sites (all subunits) and thereby reaching the maximal activity.

      (2) The contradictory results with previously published data on the structure of M. mazei by Schumacher et al. 2023.

      We certainly agree that it is confusing that Schumacher et al. 2023 obtained a dodecameric structure without the addition of 2-OG, which we claim to be essential for the dodecameric form. 2-OG is a cellular metabolite that is naturally present in E. coli, the heterologous expression host both groups used. Since our main question focused on analysing the 2-OG effect on GS, we have performed thorough dialysis of the purified protein to remove all 2-OG before performing MP experiments. In the absence of 2-OG we never observed significant enzyme activity and always detected a fast disassembly after incubation on ice. We thus assume that a dodecamer without 2-OG in Schuhmacher et al. 2023 is an inactive oligomer of a once 2-OG-bound form, stabilized e.g. by the presence of 5 mM MgCl2.

      The GlnA1-GlnK1-structure (crystallography) by Schumacher et al. 2023 is in stark contrast to our findings that GlnK1 and GlnA1 do not interact as shown by mass photometry with purified proteins. A possible reason for this discrepancy might be that at the high protein concentrations used in the crystallization assay, complexes are formed based on hydrophobic or ionic protein interactions, which would not form under physiological concentrations.

      Reviewer #2 (Public Review):

      We thank Reviewer #2 for the detailed assessment and valuable input. We will address those comments in the updated manuscript and clarify the message.

      (1) The discrepancy of the dodecamer formation (max. at 5 mM 2-OG) and the enzyme activity (max. at 12.5 mM 2-OG).

      We assume that there are two effects caused by 2-OG: 1. cooperativity of binding (less 2-OG needed to facilitate dodecamer formation) and 2. priming of each active site. See also Reviewer #1 R.1). We assume this is the reason why the activity of dodecameric GlnA1 can be further enhanced by increased 2-OG concentration until all catalytic sites are primed.

      (2) The lack of the structure of a 2-OG and ATP-bound GlnA1.

      Although we strongly agree that this would be a highly interesting structure, it seems out of the scope of a typical revision to request new cryo-EM structures. We evaluate the findings of our present study concerning the 2-OG effects as important insights into the strongly discussed field of glutamine synthetase regulation, even without the requested additional structures.

      (3) The observed GlnA1-filaments are an interesting finding.

      We certainly agree with the referee on that point, that the stacked polymers are potentially induced by 2-OG or ions. However, it is out of the main focus of this manuscript to further explore those filaments. Nevertheless, this observation could serve as an interesting starting point for future experiments.

      Reviewer #3 (Public Review):

      We thank Reviewer #3 for the expert evaluation and inspiring criticism.

      (1) Encouragement to examine ligand-bound states of GlnK1.

      We agree and plan to perform the suggested experiments exploring the conditions under which GlnA1 and GlnK1 might interact. We will perform the MP experiments in the presence of ATP. In GlnA1 activity test assays when evaluating the presence/effects of GlnK1 on GlnA1 activity, however, ATP was always present in high concentrations and still we did not observe a significant effect of GlnK1 on the GlnA1 activity.

      (2) The exact role of 2-OG could have been dissected much better.

      We agree on that point and will improve the clarity of the manuscript. See also Reviewer #1 R.1.

      (3) The lack of studies on dimers.

      This is actually an interesting point, which we did not consider during writing the manuscript. Now, re-analysing all our MP data in this respect, GlnA1 is likely a dimer as smallest species. Consequently, we will add more supplementary data which supports this observation and change the text accordingly.

      (4) Previous studies und structures did not show the 2-OG.

      We assume that for other structures, no additional 2-OG was added, and the groups did not specifically analyse for this metabolite either. All methanoarchaea perform methanogenesis and contain the oxidative part of the TCA cycle exclusively for the generation of glutamate (anabolism) but not a closed TCA cycle enabling them to use internal 2-OG concentration as internal signal for nitrogen availability. In the case of bacterial GS from organisms with a closed TCA cycle used for energy metabolism (oxidation of acetyl CoA) like e.g. E. coli, the formation of an active dodecameric GS form underlies another mechanism independent of 2-OG. In case of the recent M. mazei GS structures published by Schumacher et al. 2023, the dodecameric structure is probably a result from the heterologous expression and purification from E. coli. (See also Reviewer #1 R.2). One example of methanoarchaeal glutamine synthetases that do in fact contain the 2-OG in the structure, is Müller et al. 2023.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (public review and recommendations for the authors):

      Major points:

      (1) The identification of RAMP4 is a pivotal discovery in this paper. The sophisticated AlphaFold prediction, de novo model building of RAMP4's RBD domain, and sequence analyses provide strong evidence supporting the inclusion of RAMP4 in the ribosome-translocon complex structure.

      However, it is crucial to ensure the presence of RAMP4 in the purified sample. Particularly, a validation step such as western blotting for RAMP4 in the purified samples would strengthen the assertion that the ribosome-translocon complex indeed contains RAMP4. This is especially important given the purification steps involving stringent membrane solubilization and affinity column pull-down.

      As suggested, we have added Western blots showing that RAMP4 is retained at secretory translocons (and not multipass translocons) after solubilisation, affinity purification, and recovery of ribosome-translocon complexes (Fig. 3F). This data supports both our assignment of RAMP4 in ribosome-translocon complexes, and also the structure-based proposition that its occupancy is mutually exclusive with the multipass translocon (in particular, the PAT complex).  

      (2) Despite the comprehensive analyses conducted by the authors, it is challenging to accept the assertion that the extra density observed in TRAP class 1 corresponds to calnexin. The additional density in TRAP class 1 appears to be less well-resolved, and the evidence for assigning it as calnexin is insufficient. The extra density there can be any proteins that bind to TRAP. It is recommended that the authors examine the density on the ER lumen side. An investigation into whether calnexin's N-globular domain and P-domain are present in the ER lumen in TRAP class 1 would provide a clearer understanding.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. We have exhaustively searched our maps for any unexplained density connected with the putative Calnexin TMD, and have found none. This is consistent with Calnexin's lumenal domain being flexibly linked to its TMD, and thus would not be resolved in a ribosome-aligned reconstruction.

      Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we have ensured that the text and figures consistently describe this assignment as provisional or putative.

      (3) In the section titled 'TRAP competes and cooperates with different translocon subunits,' the authors present a compelling explanation for why TRAP delta defects can lead to congenital disorders of glycosylation. To enhance this explanation, it would be valuable if the authors could provide additional analyses based on mutations mentioned in the references. Specifically, examining whether these mutations align with the TRAP delta-OSTA structure models would strengthen the link between TRAP delta defects and the observed congenital disorders of glycosylation.

      We agree that mapping disease-causing point mutants to the TRAP delta structure could be potentially informative. Unfortunately, the referenced TRAP delta disease mutants act by simply impairing TRAP delta expression, and thus admit no such fine-grained analyses. However, sequence conservation is our next best guide to mutant function. We note in the text that the contact site charges on TRAP delta and RPN2 are conserved, and that the closest-juxtaposed interaction pair (K117 on TRAPδ and D386 on RPN2) is also the most conserved.

      Here are some minor points:

      (1) In the introduction, when the EMC, PAT, and BOS complexes were initially mentioned, it would be beneficial for the authors to provide more context or cite relevant references. This additional information will aid readers in better understanding these complexes, ensuring a smoother comprehension of their significance in the context of the study.

      The Introduction has been edited to provide more context with relevant references. 

      (2) In Figure 7, it would be valuable for the authors to include details on how they sampled the sequence alignments. 

      To clarify this methodological point, we have revised the Figure 7 caption to include these sentences: “The logo plots in panels A and D represent an HMM generated by jackHMMER upon convergence after querying UniProtKB’s metazoan sequences with the human TRAPα sequence. Only signal above background is shown, as rendered by Skylign.org.”

      Reviewer #2 (public review and recommendations for the authors):

      Strengths:

      The manuscript contains numerous novel new structural analyses and their potential functional implications. While all findings are exciting, the highlight is the discovery of RAMP4/SERP1 near the Sec61 lateral gate. Overall, the strength is the thorough and extensive structural analysis of the different high-resolution RTC classes as well as the expert bioinformatic evolutionary analysis.

      Weaknesses:

      A minor downside of the manuscript is the sheer volume of analyses and mechanistic hypotheses, which makes it sometimes difficult to follow. The authors might consider offloading some analyses based on weaker evidence to the supplement to maximize impact.

      We agree that the manuscript is long, but we have retained what we feel are the most important findings in the main text because the supplement is often undiscoverable via literature searches. Indeed, we chose eLife for its flexibility regarding article length and suitability for extended and detailed analyses. 

      Major:

      - Figure S1 does not capture the fact that a PAT-free subset of particles is analyzed. The PAT classification step should be added.

      We apologise for having caused some confusion on this point: we do not show a PAT classification step because there was none. Instead we reanalysed the whole dataset with a focus on Sec61 and TRAP. The very little PAT present (9% of particles, per Smalinskaitė et al. 2022) appeared as a very weak density in some of the closed-Sec and weak-TRAP classes.

      - The assignment of calnexin appears highly speculative. As the authors acknowledge the EM density is clearly of insufficient resolution for identification, and also AF2 does not render orthogonal support for the interpretation. The binding to TRAPg also does not explain complex formation in lower eukaryotes that do not have TRAPg. The authors may consider moving the calnexin assignment and interpretation to the supplement as it appears highly speculative. In any case, it should not be referred to as a hypothesis and not a structure.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we have ensured that the text and figures consistently describe this assignment as provisional or putative.

      - P. 8: "This extensive competition explains why prior studies found TRAP in only 40% of MPT complexes, but at high occupancy at all other RTCs29". The interpretation is at odds with a recent re-analysis of the same dataset (preprint: Gemmer et al 2023, https://doi.org/10.1101/2023.11.28.569136), which finds TRAP occupancy to negatively correlate with PAT, not BOS.

      The reviewer is correct that the Gemmer study demonstrates a negative correlation between PAT and TRAP occupancy, but it does not, as the reviewer claims, argue against a negative correlation between BOS and TRAP. In fact it agrees that Sec61•BOS•PAT complex would clash with TRAP, and that therefore “BOS could trigger release of TRAP from the multipass translocon.” Thus, there is no conflict between the two studies. The revised text in this passage now cites the Gemmer et al. preprint and clarifies that TRAP is partially displaced by competition with BOS, but retained at the translocon via its ribosome-binding domain.  

      - P. 7/8: the authors suggest that TRAPd may be important for OSTA recruitment and hence TRAPd deletion may cause glycosylation defects in patients by failure to recruit OSTA. However, cryo-ET studies (Pfeffer et al, Nat. Comms 2017) showed that OSTA still binds in patient-derived microsomes (and the OSTA-TRAPd interaction). The author should discuss their model in the light of these data.

      As explained in the text, our hypothesis predicts that TRAPδ is more important for OSTA’s recruitment to the RTC than for its RTC affinity: “OSTA’s attraction to TRAPδ is weak compared to its binding to the ribosome, but TRAPδ may nonetheless help recruit OSTA, since TRAPδ would attract OSTA from most possible angles of approach, whereas OSTA’s ribosome contacts are stereospecific.” Therefore the fact that Pfeffer et al. 2017 found OSTA at some TRAPδ-negative RTCs is not surprising. For confirmation we would look for TRAPδ-dependent glycosylation sites in fast-folding domains or otherwise kinetically sensitive loci, and indeed TRAP-dependence screens return complex profiles that could be consistent with such a mechanism (Phoomak et al. 2021).

      - Some confidence measure for the assignment of SERP1/RAMP4 should be provided adding support for the claim "The resolution of the RBD density was sufficient for de novo modelling". Indeed, the N-terminal ribosome-bound segment appears well resolved and programs like Modelangelo or FindMySequence should provide a confidence measure for the assignment of the density to SERP1. The TM part appears less well resolved, but the connectivity to the Nterminus may justify the assignment, which should be elaborated on.

      Although we appreciate the value of tools like Modelangelo or FindMySequence, and would have used them if we were resting our assignment of RAMP4 on its RBD alone, we feel that such analyses would be superfluous here. They would quantify only the buildability of RAMP4’s

      RBD, whereas the real question of RAMP4’s assignability is independently supported by AlphaFold’s confirmation of RAMP4’s TMD as the Sec61-binding density, and further biochemical data provided or cited in the paper.

      - P. 3: "Because PAT complex recruitment and MPT assembly are just beginning, ..." the implicit kinetic model seems to be that the MPT subcomplexes assemble on ribosome and Sec61. What is the evidence for this model and later recruitment of PAT (as opposed to GEL, BOS, and PAT binding pre-assembled)?

      The work of Sundaram et al. (PMID 36261522) established that PAT, GEL and BOS do not coassociate appreciably in the absence of the ribosome-Sec61 complex. This is consistent with the structural data in Smalinskaite et al. (PMID 36261528), which shows that PAT, GEL, and BOS each contact the ribosome (and Sec61 in the case of PAT and BOS), but have few if any specific contacts among themselves. Finally, data in both of these studies show that recruitment of each complex to the RNC is not lost when any of them is missing, arguing that each is capable of independent recruitment to ribosome-Sec61 complexes. 

      - p. 4: the meaning of the sentence "Stabilising interactions with this widely conserved motif may help Sec61 respond to its diverse substrates with a consistent open state." is not entirely clear. Published single-particle cryo-EM structures of RTC appear to have resulted in various degrees of openness.

      Here we were referring not to RTC structures in general, but to substrate-engaged RTCs in particular.  The two substrate-engaged RTC structures under discussion in this paragraph are nearly identical (Figure 2c) despite large differences in substrate sequence (RhoTM2 vs preprolactin’s SP). We were surprised to find that this engaged structure creates noncovalent bonds between the Sec61 N-half and the ribosome. This bonding would tend to stabilise this particular engaged structure, and this stabilisation helps explain why the newly observed TMengaged structure is so similar to the previously observed SP-engaged structure. Without this stabilising N-half interaction, one might instead expect to see more variability, such as the reviewer suggests.

      - A recent analysis of heimdallarchaea already hypothesized TRAP in these organisms and should be cited: Eme et al, Nature 618:992-999 (2023). The novel findings of this manuscript compared to Eme et al should be discussed.

      We thank the reviewer for bringing this relevant contemporaneous work to our attention. Reviewing the putative TRAP homologs identified by Eme et al, we find that most do not in fact appear to be TRAP homologs at all, judged by the measures used in our work (reciprocal HHpred queries against the human proteome and predicted structural similarity). This is not surprising since Eme et al. relied on low-threshold sequence similarity searches rather than structural measures. To acknowledge this work, we have added a sentence as follows (italics): “To test whether these candidates are also similar to TRAPαβγ in sequence, we used them to perform reciprocal HHpred queries of the human proteome, and in each case the corresponding human TRAP protein was the top hit (E = 0.031 for TRAPα, 9.4×10-14 for TRAP β, and 110 for

      TRAPγ). A contemporaneous study has also claimed to find TRAP homologs in

      Heimdallarchaeota (Eme et al. 2023), although some caution is warranted in these assignments because they do not seem to share predicted structural similarity to TRAP subunits and do not find human homologs in reciprocal HHpred queries.”

      - Given that the authors expand the evolutionary analysis of TRAP to archaea it would be helpful if sampling for RAMP4 were consistent (i.e., is TRAP present in the early eukaryotes that do not feature RAMP4? Is RAMP4 absent from heimdallarchaea?).

      As stated in the text, RAMP4’s absence from early-branching eukaryotic taxa indicates that it was also absent from their archaeal ancestors. We did of course run such queries for completeness and indeed find no archaeal RAMP4. TRAP, for its part, is generally present in early-branching eukaryotic taxa, as stated in the text, and this necessarily includes those from which RAMP4 is absent.

      - The authors may consider discussing (Gemmer et al 2023, https://doi.org/10.1101/2023.11.28.569136), which comes to similar conclusions for NEMO integration into the MPT.

      We thank the reviewer for bringing this relevant work to our attention. We have added the following sentence to the section on NOMO: “Contemporaneous work has arrived at a similar model for PLD10-12 but did not model PLD1 (Gemmer et al. 2023).”

      - The abundance approximation of RAMP4 in the native translocon by OccuPy should probably be taken with a grain of salt. The '80%' mentioned in the conclusion may stick around and could eventually turn out to be closer to 100%.

      It is certainly possible that the occupancy of RAMP4 is higher than OccuPy estimates.

      Unfortunately no available method can provide occupancy estimates with confidence intervals. The Western blots we have added to the revised manuscript are consistent with high occupancy, but cannot discriminate between 80 or 100%.

      Minor

      - p. 5: The following sentence is incomplete: "Together, these factors explain why RAMP4's occupancy in prior cryo-EM maps was low enough to be overlooked, although in hindsight seems to be visible in several7,68,69"

      Thank you for catching this typo. We have revised the sentence as follows: “Together, these factors explain why RAMP4's occupancy in prior cryo-EM maps was low enough to be overlooked, although in hindsight it is visible in several of them.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors demonstrate that ASGR1 is degraded in response to RSPO2RA-antibody treatment through both the proteasomal and the lysosomal pathway, suggesting that this is due to the RSPO2RA-mediated recruitment of ZNRF3/RNF43, which have E3 ubiquitin ligase activity. The paper doesn't show, however, if ASGR1 is indeed ubiquitinated.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrated ubiquitination of ASGR1 with SWEETS treatment (new Fig. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      The authors conclude that the RSPO2A-Ab fusions can act as a targeted protein degredation platform, because they can degrade ASGR. While I agree with this statement, I would argue that the goal of these Abs would not be to degrade ASGR per se. The argumentation is a bit confusing here. This holds for both the results and the discussion section: The authors focus on the dual role of their agents, i.e. on promoting both WNT signaling AND on degrading ASGR1. They might want to reconsider how they present their data (e.g. it may be interesting to target ASGR1, but one would presumably then like to do this without also increasing WNT responsiveness?).

      We thank the reviewer for this comment. As the reviewer states, the initial goal of the RSPO2RA-ab fusions was to generate tissue-specific RSPO mimetics that focus on elimination of E3. As an unintended consequence, we observed enhanced elimination of ASGR as well. While this was unintended, the results did provide POC that when an E3 ligase is brought into proximity of another protein, ubiquitination and degradation of this protein may occur. Additionally, our results highlight that one needs to be careful in fully assessing the impact of bispecific molecules on the intended target as well as unintended targets to understand the potential side effects of such bispecific molecules. We have revised the manuscript to make this more clear, both in the Results and Discussion sections.

      Lines 326-331: The authors use a lot of abbreviations for all of the different protein targeting technologies, but since they are hinting at specific mechanisms, it would be better to actually describe the biological activity of LYTAC versus AbTAC/PROTAB/REULR so non-experts can follow.

      We thank the reviewer for this suggestion. We have added more details in the Discussion to highlight the different mechanisms of the various systems described.

      Can the authors comment on how 8M24 and 8G8 compare to 4F3? The latter seems a bit more specific (ie. lower background activity in the absence of ASGR1 in 5C)? Are there any differences/advances between 8M24 and 8G8 over 4F3? This remains unclear.

      These three antibodies bind different regions/epitopes on ASGR. 8M24 and 8G8 bind non-overlapping epitopes on the carbohydrate recognition domain (CRD), while 4F3 binds the stalk region outside of the CRD. This information is in the Results section of the manuscript. We do not believe that the difference in the ASGR binding epitopes contributes to the slight differences in the background activity. The slight differences may be due to differences in the conformation of the antibodies resulting from the differences in their primary sequences, and these differences may not be significant. We have now repeated the experiments in Fig. 5C and 5D to address the reviewer’s next comment on the axis. These new data (new Fig. 5C and 5D) show less background differences between the molecules.

      Can the authors ensure that the axes are labelled/numbered similarly for Fig 5B-D? This will make it easier to compare 5C and 5D.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B–D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Reviewer #2 (Public Review):

      Weaknesses:

      The authors show crystal structures for binding of these antibodies to ASGR1/2, and hypothesize about why specificity is mediated through specific residues. They do not test these hypotheses.

      We thank the reviewer for this comment. We did not further test the residue contributions to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      The authors demonstrate in hepatocyte cell lines that these function as mimetics, and that they do not function in HEK cells, which do not express ASGR1. They do not perform an exhaustive screen of all non-hepatocyte cells, nor do they test these molecules in vivo.

      We agree with the reviewer. For the 4F3-based SWEETS molecule, additional in vitro and in vivo specificity characterized were performed and described in Zhang et al., Sci Rep, 2020. Since 8M24 is human specific and 8G8 only weakly interacts with mouse receptors, in vivo experiments in mouse were not performed. While we did not extensively test the 8M24- and 8G8-based SWEETS on additional cell lines or in vivo, we do believe the data presented strongly support the hepatocyte-specific effects of these molecules.

      Surprisingly, these molecules also induced loss of ASGR1, which the authors hypothesize is due to ubiquitination and degradation, initiated by the E3 ligases recruited to ASGR1. They demonstrate that inhibition of either the proteasome or lysosome abrogates this effect and that it is dependent on E1 ubiquitin ligases. They do not demonstrate direct ubiquitination of ASGR1 by ZNRF3/RNF43.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrate ubiquitination of ASGR1 with SWEETS treatment (new Figs. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are multiple instances where articles (i.e. the use of "the") are missing.

      We thank the reviewer for this comment. Following the suggestion, the manuscript has gone through a detailed review by an editorial service, and these and other grammatical errors have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The best I can think of is to inject these into Wnt reporter mice (or maybe humanized mice) and see if the liver lights up while other tissues do not.

      We thank the reviewer for this suggestion. The liver specificity was demonstrated in vivo in our earlier publication (SciRep, 10:13951, 2020) with the 4F3-RSPO2RA molecule. Unfortunately, as the results in this manuscript show, the new ASGR binders 8M24 and 8G8 either do not bind or only weakly interact with mouse receptors. Therefore, the in vivo experiments were not performed here.

      You could also consider addressing some of the statements in the manuscript that are currently hypothetical experimentally.

      We thank the reviewer for this comment. We did not further test the residues’ contribution to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      It would be easier to compare the graphs in 5B-D if all Y-axes were the same scale, with the same scientific notation.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B-D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Some of the western blots in Figure 6 do not have antibody/target labels, making them harder to interpret.

      All the Western blots antibody/target labels are on the right side of the blots for each panel, we have now made the text bold and thus easier to identify.

      Figure 6 and Supplementary Figure 2 are the same I think.

      Figure 6 and Supplementary Figure 2 show the same experimental set-up performed on two different cell lines, Fig. 6 is on Huh7 cells and Supplementary Fig. 2 is on HepG2 cells. The results from these two cell lines are quite consistent, making their appearance very similar.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviews

      All reviewers were positive about the rigor and impact of our work and offered a number of very helpful suggestions. We have done a number of suggested experiments, whose results have been added to the revision. We have also used their suggestions to improve the clarity and precision with which we describe and interpret our results.

      Reviewer 1 found the paper to be clearly written, with novel results, and the conclusions relevant and solid. This review offered many insights and thoughtful suggestions, which we have adopted to greatly improve the manuscript. The referee’s points are listed below with our responses.

      The study chooses to examine growth only in the prospective wing blade (the "pouch") rather than the wing disc as a whole. This can create biases, as fat and ds manipulations often cause stronger effects on growth, and on Hippo signaling targets, in the adjacent hinge regions of the disc. So I am curious about this choice. 

      Actually, several experiments described in the manuscript measured growth in regions of the wing disc that did not include the pouch (Fig 1 supplement 4). We found that in the second phase of allometric growth, growth of the pouch was greater than growth of the hinge-notum (Fig.1G and Fig 1 supplement 4).  We also looked at the effect of Ds and Fat on growth of the hinge-notum (Fig 4 supplement 1 and Fig 5 supplement 2). Loss of Ds or Fat also affected allometric growth of the pouch differently from their effects on allometric growth of the hinge-notum. We therefore treated analysis of each region independently. Greater focus was given to wing pouch growth because it was in this region that we detected the interesting gradient properties in Fat and Ds expression.

      The limitation to the wing region also creates some problems for the measurements themselves. The division between wing and pouch is not a strict lineage boundary, and thus cells can join or leave this region, creating two different reasons for changes in wing pouch size; growth of cells already in the region, or recruitment of cells into or out of the region. The authors do not discuss the second mechanism.

      We agree with this assessment that pouch growth can occur via lineage-restricted growth or by recruitment of cells into the region. This has now been clarified in the Introduction and the Discussion with discussion of the second mechanism.

      It is not at all clear that the markers for the pouch used by the authors are stable during development. One of these is Vg expression, or the Vg quadrant enhancer. But the Vgexpressing region is thought to increase by recruitment over late second and third instar through a feed-forward mechanism by which Vg-expressing cells induce Vg expression in adjacent cells. In fact, this process is thought to be driven in part by Fat and Ds (Zecca et al 2010). So when the authors manipulate Fat and Ds are they increasing growth or simply increasing Vg recruitment? I would prefer that this limitation be addressed. 

      There is the possibility that the feedforward recruitment of disc cells to express Vg leads to some expansion of the measured pouch domain. However, we argue that the recruitment mechanism may not be contributing significantly to the phenomena we measured in this study. 1) We limited our analysis of pouch growth to the third instar stage. In Fig.2, Zecca and Struhl (2007 doi 10.1242/dev.006411) found that recruitment was much stronger in clones induced at first instar rather than third instar, and so they limited their clonal analysis throughout the paper to first instar induced clones. Thus, it is unclear how much the feedforward recruitment mechanism contributes to pouch growth in the mid-to-late third instar. 2) We detected an effect of Ds and Fat on how rapidly the cell cycle slows down over time in pouch cells. The effect is entirely consistent with it having a causal effect on wing pouch growth. For example, nub>Ds(RNAi) causes the average third instar pouch cell to divide ~25% more rapidly than normal, when comparing the slopes in Figure 6. Note that at the beginning of the third instar, the average pouch cell has a similar doubling time whether lacking Ds or not (Figure 6). When we measured the final size of the wing pouch at the end of the third instar, nub>Ds(RNAi) caused the pouch to be ~30% larger than normal (Figure 5). This effect is quite comparable to the effect of Ds RNAi on cell doubling.

      To provide more rigorous evidence that the effect of Fat and Ds on cell cycle dynamics is primarily responsible for their effects on wing growth that we measured, we have adapted the simple growth modeling framework from Wartlick et al (2011) and fit our cell cycle measurements made for different genotypes. These fits give us estimates for instantaneous cell growth rates over time, and using these estimates, we simulated the theoretical growth trajectory of the entire wing pouch for wildtype and ds / fat RNAi animals. When we compare these model predictions of wing growth to our pouch volume measurements over time, they agree very well with one another. These

      analyses and results are now discussed in the Results and presented in Fig. 6 supplement 2. Overall, it supports a model that Fat and Ds regulate cell cycle dynamics in the wing pouch during third instar and this effect is primarily responsible for Fat and Ds’s effect on overall wing pouch growth in that timeframe. It does not rule out that Fat and Ds might also affect Vg recruitment at third instar, but such effects must be small relative to the primary effect on the cell cycle. It is feasible that Fat and Ds work via the feedforward mechanism at earlier larval stages. We have now discussed all this in detail in the Discussion considering the limitation of recruitment. 

      The second pouch marker the authors use is epithelial folding, but this also has problems, as Fat and Ds manipulations change folding. Even in wild type, the folding patterns are complex. For instance, to make folding fit the Vg-QE pattern at late third the authors appear to be jumping in the dorsal pouch between two different sets of folds (Fig 1S2A). The authors also do not show how they use folding patterns in younger, less folded discs, nor provide evidence that the location of the folds are the same and do not shift relative to the cells. They also do not explain how they use folds and measure at later wpp and bpp stages, as the discs unfold and evert, exposing cells that were previously hidden in the folds.

      The primary marker we used for the pouch boundary were the folds. We agree with the reviewer that our original description of how we defined the pouch boundary using the folds was inadequate. We now have substantially expanded the Methods section describing how we defined the boundary at all stages using the folds, including a supplementary figure (Fig 1 supplement 2). Importantly, in our measurements, we did not exclude the pouch regions within the folds but included them (see also the next point). Our microscopy detected fluorescence in the folds, and surface rendering allowed us to visualize fold structure and its contents. In younger discs with less folding, we defined the boundary by the location of the Wg inner ring. The folds were more prominent in older L3 larval discs and in the WPP and later stages since the wings had not fully everted yet. Therefore, we used accepted morphological definitions of the pouch boundary from the literature to define the boundaries. We were able to do so even though, as the reviewer notes, the fold architecture evolves as the larvae age. We agree with the reviewer that defining a boundary based on morphology could be error prone, especially prone to systematic error based on age. It is the main reason we directly compared the morphologically defined boundaries to boundaries defined by the Vg quadrant expression domain for many wing discs across all ages. As seen in Fig 1 supplement 3C, the two methods are in strong agreement with one another for discs of all ages. There is a slight overestimate of the pouch boundary using the morphological method, but the error is small (2.5%) and independent of disc size.  

      Finally, the authors limit their measurements to cells with exposed apical faces and thus a measurable area but apparently ignore the cells inside the folds. At late third, however, a substantial amount of the prospective wing blade is found within the folds, especially where they are deepest near the A/P compartment boundary. Using the third vein sensory organ precursors as markers, the L3-2 sensillum is found just distal to the fold, the L3-1 and the ACV sensilla are within the fold, and the GSR of the distal hinge is found just proximal to the fold. That puts the proximal half of the central wing blade in the fold, and apparently uncounted in their assays. These cells will however be exposed at wpp and especially bpp stages. How are the authors adjusting for this? 

      We apologize for not describing the methods of measurement thoroughly in the original submission. In fact, we did make measurements of cells located within the folds of the wing pouch at all stages. Z stacks of optical sections were collected that transversed the disc, including the folds. Using surface detection algorithms, we could make spatial measurements (xyz distances and areas) of the material within the folds enveloping the apical pouch. Therefore, we could measure the surface area and volume of the wing pouch that included the folds. This was indeed what we did and reported in the original submission. A much more complete description of the process has now been added to the Methods.

      On the other hand, we could not reliably measure Fat-GFP or Ds-GFP fluorescence intensity in cells deep in the folds due to light scattering. Therefore, we did not assay the entire gradient across the pouch. Of the cells we did measure, we know their relative distance to the center of the pouch, defined as the intersection of the AP and DV boundaries. Therefore, fluorescence intensities could be directly compared across stages since they were calibrated by the centerpoint of the pouch. We have added text to the Methods to clarify this.

      Stabilizing and destabilizing interactions between Fat and Ds- The authors describe a distal accumulation of Fat protein in the wing, and show that this is unlikely to be through Fat transcription. They further try to test whether the distal accumulation depends on destabilization of proximal Fat by proximal Ds by looking at Fat in ds mutant discs. However, the authors do not describe how they take into account the stabilizing effects of heterophilic binding between the extracellular domains (ECDs) of Fat and Ds; without one, the junctional levels and stability of the other is reduced (Ma et al., 2003; Hale et al. 2015). So when they show that the A-P gradient of Fat is reduced in a ds mutant, is this because of the loss of a destabilizing effect of Ds on Fat, as they assume, or is it because all junctional Fat has been destabilized by loss of extracelluarlar binding to Ds? The description of the Fat gradient in Ds mutants is also confusing (see note 6 below), making this section difficult for the reader to follow. 

      We did not intend to imply that Ds actively inhibits Fat. We now describe the implications of the result more clearly in the Results and Discussion with reference to the prior Hale and Ma study of heterophilic stabilization. It is worth noting that Ma et al 2003 saw elevated junctional Fat in ds mutant cells if they were surrounded by other ds mutant cells. This is consistent with our results. We also apologize for the confusion in describing the Fat gradient and have reworded the section in the Results to make it more clear.

      The authors do not propose or test a mechanism for the proposed destabilization. Fat and Ds bind not only through their ECDs, but binding has now also been demonstrated through their ICDs (Fulford et al. 2023)

      We now discuss possible mechanisms in the Discussion and include the Fulford reference in the Results.

      Ds gradient scales by volume, rather than cell number - This is an intriguing result, but the authors do not discuss possible mechanisms.

      We have now added discussion of possible mechanisms in the Discussion.

      Fat and Ds are already known to have autonomous effects on growth and Hippo signaling from clonal analyses and localized knockdowns. One novelty here is showing that localized knockdown does not delay pupariation in the way that whole animal knockdown does, although the mechanism is not investigated. Another novelty is that the authors find stronger wing pouch overgrowth after localized ds RNAi or whole disc loss of fat than after localized fat RNAi, the latter being only 11% larger. The fat RNAi result would have been strengthened by testing different fat RNAi stocks, which vary in their strength and are commonly weaker than null mutations, or stronger drivers such as the ap-gal4 they used for some of their ds-RNAi experiments or use of UAS-dcr2. Another reason for caution is that Garoia (2005) found much stronger overgrowth in fat mutant clones, which were about 75% larger than control clones.

      We thank the reviewer for this suggestion. Indeed, the weak effect of Fat RNAi had been due to the specific RNAi driver. We followed the reviewer’s suggestion and tested other RNAi stocks. We had in hand an RNAi driver against GFP that we had found in unrelated studies to be a very potent repressor of GFP expression. Since we had been using a knock-in allele of GFP inserted in frame to Fat throughout this study, we applied nub>Gal4 UAS-GFP RNAi to knock down homozygous Fat-GFP. The effect of the knockdown was very strong, as measured by residual 488nm fluorescence above background autofluorescence after knockdown. Correcting for background autofluorescence, we estimate that only 4.5% of Fat-GFP remained under RNAi conditions (Figure 5 - figure supplement 3). 

      Using the more potent RNAi reagent, we repeated the various experiments related to

      Fat. We observed a 42% increase in wing pouch growth, which is similar to that of Ds RNAi. We also observed an effect of Fat RNAi on the average cell cycle time of wing pouch cells. There was still a linear coupling between the cell cycle duration and wing pouch size, but the slope of the coupling was smaller with Fat RNAi. This was very similar to what Ds RNAi does to the cell cycle. Therefore, we have replaced the data from the original Fat RNAi experiments with the new data and modified the text throughout the manuscript to describe the new results.

      Flattening of Ds gradient does not slow growth. One model suggests that the flattening of the Ds gradient, and thus polarized Ds-Fat binding, account for slowed growth in older discs. The difficulty in the past has been that two ways of flattening the Ds gradient, either removing Ds or overexpressing Ds uniformly, give opposite results; the first increases growth, while the latter slows it. Both experiments have the problem of not just flattening the gradient, but also altering overall levels of Ds-Fat binding, which will likely alter growth independent of the gradients. Here, the authors instead use overexpression to create a strong Ds gradient (albeit a reversely oriented one) that does not flatten, and show that this does not prevent growth from slowing and arresting.

      To make sure that this is not some effect caused by using a reverse gradient, one might instead induce a more permanent normally oriented Ds gradient and see if this also does not alter growth; there is a ds Trojan gal4 line available that might work for this, and several other proximal drivers.

      Again, we thank the reviewer for this suggestion. We followed the reviewer’s suggestion and generated Trojan-Gal4 mediated overexpression of Ds. The Ds protein gradient was strongly amplified by Trojan-Gal4 but remained normally oriented. However, it only caused a modest (12%) increase in wing pouch volume. It did not significantly alter Fat expression dynamics nor the dynamics of cell cycle duration. This new data has been added to the Results (Fig. 7 and Fig 7 supplement 2) and discussed at length in the text.

      Another possible problem is that, unlike previous studies, the authors have not blocked the Four-jointed gradient; Fj alters Fat-Ds binding and might regulate polarity independently of Ds expression. A definitive test would be to perform the tests above in four-joined mutant discs.

      We examined a fj null mutant (fjp1/d1) and found that it did not alter final wing pouch size (Fig. 2 - figure supplement 3E). Moreover, neither Fat nor Ds expression were altered in the fj mutant (Fig. 2- figure supplement 3C,D). 

      The Discussion of these data should be improved. The authors state in the Discussion "The significance of these dynamics is unclear, but the flattening of the Fat gradient is not a trigger for growth cessation." While the Discussion mentions the effects of Ds on Fat distribution in some detail, this is the only phrase that discusses growth, which is surprising given how often the gradient model of growth control is mentioned elsewhere. The reader would be helped if details are given about what experiment supports this conclusion, the effect on not only growth cessation but cell cycle time, and why the result differs from those of Rogjula 2008 and Willecke 2008 using Ds and Fj overexpression.

      We have rewritten the Discussion to better reflect the results and incorporate the reviewer’s criticisms.

      The authors spend much of the discussion speculating on the possibility that Fat and Ds control growth by changing the wing's sensitivity to the BMP Dpp. As the manuscript contains no new data on Dpp, this is somewhat surprising. The discussion also ignores Schwank (2011), who argues that Fat and Dpp are relatively independent. There have also been studies showing genetic interactions between Fat and signaling pathways such as Wg (Cho and Irvine 2004) and EGF (Garoia 2005).

      We have modified the discussion to be more inclusive of mechanisms connecting Fat and other signaling pathways, and we deleted some of the speculation about Dpp. However, since Dpp is the only known growth factor whose local concentration linearly scales with average cell doubling time (the process we found Ds/Fat regulates), there is a logical connection that readers deserve to know about. Therefore, we have retained some discussion of the hypothesis that the two might be linked through cell cycle duration. It is for future studies to test that hypothesis as it is beyond the scope of this paper.

      That said, there are studies that discount the work of Wartlick’s Dpp model, eg. Schwank et al 2012, arguing that Dpp regulates growth permissively by limiting an antigrowth factor, Brinker. We have added this reference and the others in the Discussion to discuss alternative models where Fat/Ds act in parallel to Dpp. 

      Wpp and Bpp- First, the charts treat wpp as if it is a fixed number of hours after 5 day larvae, but this will not be true in fat and ds mutants with extended larval life. This should be mentioned.

      We have clarified this distinction in the figure legends.

      How are the authors limiting bpp to 1 hr from wpp? Prepupa are brown and lack air bubbles, but that spans 5 hours of disc changes from barely everted to fully wing-like.

      We deliberately chose 1 hour post WPP because we wanted to measure final wing volume with minimal eversion. We agree with the reviewer’s concerns with calling this BPP and we now call it WPP+1  

      "However, growth of the wing pouch ceased at the larva-pupa molt and its size remained constant".

      The transition from late third to wpp shown in the figure is not the pupal molt. Unlike in most insects, in Drosophila the larval cuticle is not molted away, it is remodeled during pupariation into the prepupal case. The pupal cuticle is not formed until 6 hr APF, which is why the initial stages are termed pre-pupal. Also, there is at least one more set of cell divisions that occur in later pupal stages (for instance, see recent work from the Buttitta lab).

      We have changed the reference of pupal molt to larva-prepupal transition throughout the manuscript.

      "In contrast, the notum-hinge exhibited simpler linear-like positive allometric growth (Fig. 1 - figure supplement 3C) 

      This oversimplifies, as there is still a strong inflection after the third time point, albeit not as large as with the wing because there is less notal growth.

      We have reworded the text as suggested. 

      "whereas at the WPP stage, dividing cells were only found in a narrow zone where sensory organ precursor cells undergo two divisions to generate future sensory organs (Fig. 1 - figure supplement 4C-E)."

      While there are more dividing cells at the anterior D/V, which will form sensory bristles, there are also dividing cells elsewhere, including in the posterior and scattered through the pouch, where there are no sensory precursors. Sensory organs are limited to the wing margin and the very few campaniform sensilla found on the prospective third vein. The Sens-GFP shown here, meant to identify sensory precursors, does not look much like the Sens expression in Nolo et al 2000. Anterior is on the left in 1S4A-D, but on the right in E.

      We thank the reviewer for this observation. Indeed, the Sens-GFP signal in the figure is too broad. This was owing to bleed-through of the PHH3 signal. Since the pattern of dividing cells at the WPP stage has been so well characterized in the literature, as has the pattern of Sens+ cells at that stage (ie, Nolo et al 2000), we have removed these panels and now simply cite the relevant literature.  

      "The gradient was asymmetric along the AP axis, being lower at the A margin than the P margin."

      The use of "margin" here is a bit confusing, as the term is usually used to describe the wing margin; that is, the D/V compartment boundary in the disc that forms the edge of the wing. Can the authors use a different term? It would also be helpful to point out that the A and P extremes are also, because of the geometry of the disc, the prospective proximal portions of the wing margin, and the hinge, especially since the authors are including the regions proximal to the most distal fold.

      We have reworded it as suggested.   

      The graphed loss of the Fat A-P gradient between day 5 third and wpp is dramatic. Given that the changes in folding at wpp might alter which cells are being graphed, can the authors show a photo?

      We have now included a photo of Fat-GFP at WPP in Fig 2 - figure supplement 2E.

      "Since Ds levels are highest and most steep near the margins, perhaps Ds inhibits Fat expression in a dose- or gradient-dependent manner. We also followed Fat-GFP dynamics in the ds mutant. We did not observe the progressive flattening of the FatGFP profile to the WPP wing (Fig. 2 - figure supplement 3A). Instead, the Fat-GFP profile was graded at the WPP stage and flattened somewhat more by the BPP stage (Fig. 2 - figure supplement 3B)."

      This description does not tell the reader if there is any less grading of Fat in the ds mutant compared with wild type; instead, it sounds like it is more graded, as gradation continues at wpp. This would then contradict the hypothesis that proximal Ds is required to create the distal Fat gradient.

      The Fat signals for the two genotypes are directly comparable as the samples were imaged together with the same microscope settings.  Fig 2M shows that the Fat gradient is less graded compared to the wildtype. We have reworded the text to make this more clear. But this graded expression persists longer into WPP, not the level of gradation. The reason for this is not understood.

      The figure, on the other hand, looks like Fat is less graded, although as noted above this could instead be caused by loss of the stable Ds-bound Fat normally found at junctions. 

      Fig 2M shows an increase in Fat levels at the proximal regions of the ds mutant pouch, where Ds is normally most concentrated. This makes the overall profile look less graded. 

      Confusingly, in the Discussion the authors state: "Loss of Ds affects the Fat gradient such that distribution of Fat is uniformly upregulated to peak levels." There is no mention of "peak levels" in the Results, and no mention of "graded" expression in the Discussion. I am unclear on how the absolute levels are being determined and would be surprised if there were peak levels after loss of Ds-bound Fat from junctions.

      The absolute levels between the genotypes were determined by carefully calibrated fluorescence of Fat-GFP from samples imaged at the same time with the same settings. We used the word peak to refer to the highest level of Fat-GFP within a given gradient profile. Clearly, the description is confusing and so we have deleted the word and modified the text to clarify the meaning.

      "Interestingly, the reversed Ds gradient caused a change in the Fat gradient (Fig. 7E). Its peak also became skewed to the anterior and did not normally flatten at the WPP stage."

      This result contradicts the author's earlier model that proximal Ds destabilizes Fat. Instead, the result fits the stabilization of Fat caused by binding to endogenous or overexpressed Ds or Ds ECD (Ma et al. 2003; Matakatsu and Blair, 2004; 2006; Hale et al. 2015).

      We agree that the reversed Ds affects Fat differently than the loss-of-function ds phenotype. We were not intending to propose a model based on the ds mutant, but a simple interpretation of the result. The reversed Ds experiment generates on its own a simple interpretation that is not consistent with the other. This speaks to the complexity of the system. We have changed the text in the Results to make this less confusing.

      Reviewer 2 found the paper to provide insights into normal growth of the wing and useful tools for measurement of growth features. This review offered many insights and thoughtful suggestions, which we have adopted to greatly improve the manuscript. The referee’s points are listed below with our responses.

      Although the approach used to measure volume is new to this study, the basic finding that imaginal disc growth slows at the mid-third instar stage has been known for some time from studies that counted disc cell number during larval development (Fain and Stevens, 1982; Graves and Schubiger, 1982). Although these studies did not directly measure disc volume, because cell size in the disc is not known to change during larval development, cell number is an accurate measure of tissue volume. However, it is worth noting that the approach used here does potentially allow for differential growth of different regions of the disc.

      We had cited the older literature in reference to our results. We have now noted the approach’s usefulness in measuring different disc regions such as the pouch.

      Related to point 1, a main conclusion of this study, that cell cycle length scales with growth of the wing, is based on a developmentally limited analysis that is restricted to the mid-third instar larval stage and later (early third instar begins at 72 hr - the authors' analysis started at 84 hr). The previous studies cited above made measurements from the beginning of the 3rd instar and combined them with previous histological analyses of cell numbers starting at the beginning of the 2nd instar. Interestingly, both studies found that cell number increases exponentially from the start of the 2nd instar until mid-third instar, and only after that point does the cell cycle slow resulting in the linear growth reported here. The current study states that growth is linear due to scaling of cell cycle with disc size as though this is a general principle, but from the earlier studies, this is not the case earlier in disc development and instead applies only to the last day of larval life.

      We apologize for not making this distinction clearer in the original manuscript. Indeed, growth is initially exponential and shifts to a more linear-like regime in the mid third instar. Our focus in the manuscript is primarily this latter phase. We have now rewritten the text in the Introduction, Results and Discussion to make this very clear. 

      While cell number and pouch volume increase exponentially from the start of the 2nd instar, the cell cycle already begins to slow down during the 2nd instar, as found with mitotic index measurements done by Wartlick et al 2011. Using their data to model cell cycle duration as a function of pouch area, we find that during the 2nd instar, cell cycle duration also increases as the size of the wing pouch increases. This is shown in the figure (panel C) below. Note that this relationship appears nonlinear and is quantitatively distinct from the relationship for third instar wing growth.

      Author response image 1.

      The analysis of the roles of Fat and Dachsous presented here has weaknesses that should be addressed. It is very curious that the authors found that depletion of Fat by RNAi in the wing blade had essentially no effect on growth while depletion of Dachsous did, given that the loss of function overgrowth phenotype of null mutations in fat is more severe than that of null mutations in dachsous (Matakatsu and Blair, 2006). An obvious possibility is that the Fat RNAi transgene employed in these experiments is not very efficient. The authors tried to address this by doubling the dose of the transgene, but it is not clear to me that this approach is known to be effective. The authors should test other RNAi transgenes and additionally include an analysis of growth of discs from animals homozygous for null alleles, which as they note survive to the late larval stages.

      We thank the reviewer for this suggestion. Indeed, the weak effect of Fat RNAi had been due to the specific RNAi driver. We followed the reviewer’s suggestion and tested other RNAi stocks. We had in hand an RNAi driver against GFP that we had found in unrelated studies to be a very potent repressor of GFP expression. Since we had been using a knock-in allele of GFP inserted in frame to Fat throughout this study, we applied nub>Gal4 UAS-GFP RNAi to knock down homozygous Fat-GFP. The effect of the knockdown was very strong, as measured by remaining 488nm fluorescence above background fluorescence after knockdown. Correcting for background fluorescence, we estimated that only 4.5% of Fat-GFP remained under RNAi conditions (Figure 5 - figure supplement 3). 

      Using the more potent RNAi reagent, we repeated the various experiments related to Fat. We observed a 42% increase in wing pouch growth, which is similar to that of Ds RNAi. We also observed an effect of Fat RNAi on the average cell cycle time of wing pouch cells. There was still a linear coupling between the cell cycle duration and wing pouch size, but the slope of the coupling was smaller with Fat RNAi. This was very similar to what Ds RNAi does to the cell cycle. Therefore, we have replaced the data from the original Fat RNAi experiments with the new data and modified the text throughout the manuscript to describe the new results.

      It is surprising that the authors detect a gradient of Fat expression that has not been seen previously given that this protein has been extensively studied. It is also surprising that they find that expression of Nubbin Gal4 is graded across the wing blade given that previous studies indicate that it is uniform (ie. Martín et al. 2004). These two surprising findings raise the possibility that the quantification of fluorescence could be inaccurate. The curvature of the wing blade makes it a challenging tissue to image, particularly for quantitative measurements.

      Fat protein expression not being uniform has been observed before but not carefully quantified (see Mao et al., 2009, Strutt and Strutt 2002).  Martin et al. 2004 (doi 10.1242/dev.013) claimed that Nub-Gal4 is uniform without actually measuring it. Please consult Fig 1A and 2A in their paper, which clearly shows stronger expression in the center/distal region of the pouch. 

      Regarding systematic errors in quantification, we took great pains to minimize them. We carefully divided the complex folded disc’s z stack into an apical region of interest (ROI) that included the distal domain of the wing pouch and a basal ROI that included the folds encompassing the pouch. We then used a published and widely used surface detection algorithm (ImSAnE) that captures a 3D region of interest (ROI) that can be curved and complex in shape (in z space) because the user creates a surface spline of the ROI. The resulting output treats the ROI as a virtual 2D object. This obviates the need to perform max projections of confocal stacks, which often create artifacts that the reviewer speaks of. Instead, ImSAnE eliminates such artifacts, and it is the gold standard for image processing of ROIs with 3D curvature. 

      Moreover, our pipeline does detect uniform expression if it is there. We used a da-Gal4 driver in Fig. 2K,L - this driver is widely acknowledged to be uniformly expressed in tissues of the fly. When it drives a control fluorescent marker (Bazooka-mCherry), our analysis pipeline detects a uniform expression pattern across the wing pouch (Fig. 2L). When the same Gal4 transgene drives Fat-HA in the same tissue, our pipeline detects a graded expression pattern of Fat-HA (Fig. 2L). In fact, this experiment co-expressed both Fat-HA and the control marker in the same disc. Thus, we feel confident that our analysis is not inaccurate.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This is a valuable study that develops a new model of the way muscle responds to perturbations, synthesizing models of how it responds to small and large perturbations, both of which are used to predict how muscles function for stability but also how they can be injured, and which tend to be predicted poorly by classic Hill-type models. The evidence presented to support the model is solid, since it outperforms Hill-type models in a variety of conditions. Although the combination of phenomenological and mechanistic aspects of the model may sometimes make it challenging to interpret the output, the work will be of interest to those developing realistic models of the stability and control of movement in humans or other animals.

      Reviewer #1 (Public Review):

      Muscle models are important tools in the fields of biomechanics and physiology. Muscle models serve a wide variety of functions, including validating existing theories, testing new hypotheses, and predicting forces produced by humans and animals in health and disease. This paper attempts to provide an alternative to Hill-type muscle models that includes contributions of titin to force enhancement over multiple time scales. Due to the significant limitations of Hill-type models, alternative models are needed and therefore the work is important and timely.

      The effort to include a role for titin in muscle models is a major strength of the methods and results. The results clearly demonstrate the weaknesses of Hill models and the advantages of incorporating titin into theoretical treatments of muscle mechanics. Another strength is to address muscle mechanics over a large range of time scales.

      The authors succeed in demonstrating the need to incorporate titin in muscle models, and further show that the model accurately predicts in situ force of cat soleus (Kirsch et al. 1994; Herzog & Leonard, 2002) and rabbit posts myofibrils (Leonard et al. 2010). However, it remains unclear whether the model will be practical for use with data from different muscles or preparations. Several ad hoc modifications were described in the paper, and the degree to which the model requires parameter optimization for different muscles, preparations and experiment types remains unclear.

      I think the authors should state how many parameters require fitting to the data vs the total number of model parameters. It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      (1) I think the authors should state how many parameters require fitting to the data vs the total number of model parameters.

      The total number of model parameters are listed in Table 1. Each parameter has, in addition, references listed for the source of data (if one exists) along with how the data were used (’C’ calculate, ’F’ fit, ’E’ estimated, or ’S’ for scaled) for the specific simulations that appear in this paper. While this is a daunting number of parameters, only a few of these parameters must be updated when modeling a new musculotendon.

      Similar to a Hill-type muscle model, at least 5 parameters are needed to fit the VEXAT model to a specific musculotendon: maximum isometric force (fiso), optimal contractile element (CE) length, pennation angle, maximum shortening velocity, and tendon slack length. However, similar to a Hill model, it is only possible to use this minimal set of parameters by making use of default values for the remaining set of parameters. The defaults we have used have been extracted from mammalian muscle (see Table 1) and may not be appropriate for modeling muscle tissue that differs widely in terms of the ratio of fast/slow twitch fibers, titin isoform, temperature, and scale.

      Even when these defaults are appropriate, variation is the rule for biological data rather than the exception. It will always be the case that the best fit can only be obtained by fitting more of the model’s parameters to additional data. Standard measurements of the active force-length relation, passive forcelength relation, and force-velocity relations are quite helpful to improve the accuracy of the model to a specific muscle. It is challenging to improve the fit of the model’s cross-bridge (XE) and titin models because the data required are so rare. The experiments of Kirsch et al., Prado et al, and Trombitas et´ al. are unique to our knowledge. However, if more data become available, it is relatively straight forward to update the model’s parameters using the methods described in Appendix B or the code that appears online (https://github.com/mjhmilla/Millard2023VexatMuscle).

      We have modified the manuscript to make it clear that, in some circumstances, the burden of parameter identification for the VEXAT model can be as low as a Hill model:

      - Section 3: last two sentences of the 2nd paragraph, found at: Page 10, column 2, lines 1-12 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      - Table 1: last two sentences of the caption, found at: Page 11 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      (2) It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      All of the experiments simulated in this work are in-situ or ex-vivo. So far the main challenges of simulating any experiment have been quite consistent across both in-situ and ex-vivo datasets: there are insufficient data to fit most model parameters to a specific specimen and, instead, defaults from the literature must be used. In an ideal case, a specimen would have roughly ten extra trials collected so that the maximum isometric force, optimal fiber length, active force-length relation, passive force-length relation (upto ≈ 0_._6_f_oM), and the force-velocity relations could be identified from measurements rather than relying on literature values. Since most lab specimens are viable for a small number of trials (with the exception of cat soleus), we don’t expect this situation to change in future.

      However, if data are available the fitting process is pretty straight forward for either in-situ or ex-vivo data: use a standard numerical method (for example non-linear least squares, or the bisection method) to adjust the model parameters to reduce the errors between simulation and experiment. The main difficulty, as described in the previous paragraph, is the availability of data to fit as many parameters as possible for a specific specimen. As such, the fitting process really varies from experiment to experiment and depends mainly on the richness of measurements taken from a specific specimen, and from the literature in general.

      Working from in-vivo data presents an entirely different set of challenges. When working with human data, for example, it’s just not possible to directly measure muscle force with tendon buckles, and so it is never completely clear how force is distributed across the many muscles that typically actuate a joint. Further, there is also uncertainty in the boundary condition of the muscle because optical motion capture markers will move with respect to the skeleton. Video fluoroscopy offers a method of improving the accuracy of measured boundary conditions, though only for a few labs due to its great expense. A final boundary condition remains impossible to measure in any case: the geometry and forces that act at the boundaries as muscle wraps over other muscles and bones. Fitting to in-vivo data are very difficult.

      While this is an interesting topic, it is tangent to our already lengthy manuscript. Since these reviews are public, we’ll leave it to the motivated reader to find this text here.

      Reviewer #2 (Public Review):

      This model of skeletal muscle includes springs and dampers which aim to capture the effect of crossbridge and titin stiffness during the stretch of active muscle. While both crossbridge and titin stiffness have previously been incorporated, in some form, into models, this model is the first to simultaneously include both. The authors suggest that this will allow for the prediction of muscle force in response to short-, mid- and long-range stretches. All these types of stretch are likely to be experienced by muscle during in vivo perturbations, and are known to elicit different muscle responses. Hence, it is valuable to have a single model which can predict muscle force under all these physiologically relevant conditions. In addition, this model dramatically simplifies sarcomere structure to enable this muscle model to be used in multi-muscle simulations of whole-body movement.

      In order to test this model, its force predictions are compared to 3 sets of experimental data which focus on short-, mid- and long-range perturbations, and to the predictions of a Hill-type muscle model. The choice of data sets is excellent and provide a robust test of the model’s ability to predict forces over a range of length perturbations. However, I find the comparison to a Hill-type muscle model to be somewhat limiting. It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch. Hence, that the model proposed here represents an improvement over such a model is not a surprise. Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      The paper begins by outlining the phenomenological vs mechanistic approaches taken to muscle modelling, historically. It appears, although is not directly specified, that this model combines these approaches. A somewhat mechanistic model of the response of the crossbridges and titin to active stretch is combined with a phenomenological implementation of force-length and force-velocity relationships. This combination of approaches may be useful improving the accuracy of predictions of muscle models and whole-body simulations, which is certainly a worthy goal. However, it also may limit the insight that can be gained. For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening. In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      (1) It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch.

      While many muscle physiologists are aware of the limitations of the Hill model, these limitations are not so well known among computational biomechanists. There are at least two reasons for this gap: there are few comprehensive evaluations of Hill models against several experiments, and some of the differences are quite nuanced. For example, active lengthening experiments can be replicated reasonably well using a Hill model if the lengthening is done on the ascending limb of the force length curve. Clearly the story is quite different on the descending limb as shown in Figure 9. Similarly, as Figure 8 shows, by choosing the right combination of tendon model and perturbation bandwidth it is possible to get reasonably accurate responses from the Hill model to stochastic length changes. Yet when a wide variety of perturbation bandwidths, magnitudes, and tendon models are tested it is clear that the Hill model cannot, in general, replicate the response of muscle to stochastic perturbations. For these reasons we think many of the Hill model’s drawbacks have not been clearly understood by computational biomechanists for many years now.

      (2) Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      We agree that it will be valuable to benchmark other models in the literature using the same set of experiments. Hopefully we, or perhaps others, will have the good fortune to secure research funding to continue this benchmarking work. This will, however, be quite challenging: few muscle models are accompanied by a professional-quality open-source implementation. Without such an implementation it is often impossible to reproduce published results let alone provide a fair and objective evaluation of a model.

      (3) For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening.

      The titin model described in the paper will provide an enhancement of force during a stretch-shortening cycle. This certainly would be an interesting next experiment to simulate in a future paper.

      (4) In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      We can only respond to what drives the frequency dependent stiffness in the model, though we’re quite interested in what happens physiologically. Hopefully that there are some new experiments done to examine this phenomena in the future. In the case of the model, the reasons are pretty straight forward: the formulation of Eqn. 16 is responsible for this shift.

      Equation 16 has been formulated so that the acceleration of the attachment point of the XE is driven by the force difference between the XE and a reference Hill model (numerator of the first term in Eqn. 16) which is then low pass filtered (denominator of the first term in Eqn. 16). Due to this formulation the attachment point moves less when the numerator is small, or when the differences in the numerator change rapidly and effectively become filtered out. When the attachment point moves less, more of the CE’s force output is determined by variations in the length of the XE and its stiffness.

      On the other hand, the attachment point will move when the numerator of the first term in Eqn. 16 is large, or when those differences are not short lived. When the attachment point moves to reduce the strain in the XE, the force produced by the XE’s spring-damper is reduced. As a result, the CE’s force output is less influenced by variations of the length of the XE and its stiffness.

      Reviewer #2 (Recommendations for the Authors):

      I find the clarity of the manuscript to be much improved following revision. While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction, the revised description of small length changes makes the interpretation much less confusing.

      Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established. Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      (1) While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction ...

      We have had to abstract some of the details of reality to have a model that can be used to simulate hundreds of muscles. In contrast, FiberSim produced by Kenneth Campbell’s group uses much less abstraction and might be of greater interest to you. FiberSim’s models include individual cross-bridges, titin molecules, and an explicit representation of the spatial geometry of a sarcomere. While this model is a great tool for testing muscle physiology questions through simulation, it is computationally expensive to use this model to simulate hundreds of muscles simultaneously.

      Kosta S, Colli D, Ye Q, Campbell KS. FiberSim: A flexible open-source model of myofilament-level contraction. Biophysical journal. 2022 Jan 18;121(2):175-82.https://campbell-muscle-lab.github.io/FiberSim/

      (2) Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established.

      Please see our response 1 to Reviewer # 1.

      (3) Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      Please see our response to 2 to Reviewer #1.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      In the paper by Choi et al., the authors aimed to develop base editing strategies to convert CAG repeats to CAA repeats in the huntingtin gene (HTT), which causes Huntington's disease (HD). They hypothesized that this conversion would delay disease onset by shortening the uninterrupted CAG repeat. Using HEK-293T cells as a model, the researchers employed cytosine base editors and guide RNAs (gRNAs) to efficiently convert CAG to CAA at various sites within the CAG repeat. No significant indels, off-target edits, transcriptome alterations, or changes in HTT protein levels were detected. Interestingly, somatic CAG repeat expansion was completely abolished in HD knock-in mice carrying CAA-interrupted repeats. 

      Correction of factual errors

      We analyzed HEK293 cells, not "HEK-293T".

      Strengths: 

      This study represents the first proof-of-concept exploration of the cytosine base editing technique as a potential treatment for HD and other repeat expansion disorders with similar mechanisms. 

      Weaknesses: 

      Given that HD is a neurodegenerative disorder, it is crucial to determine the efficiency of the base editing strategies tested in this manuscript and their feasibility in relevant cells affected by HD and the brain, which needed to be improved in this manuscript. 

      We appreciate the reviewer's constructive recommendations. Our genetic investigation focused on understanding observations in HD patients to develop genetic-based treatment strategies and test their feasibility. We agree with the reviewer regarding the importance of data from relevant cell types. Unfortunately, the levels of CAG-to-CAA conversion in the patient-derived neurons were modest, as described in our manuscript (approximately 2%). In addition, AAV did not produce detectable conversions in the brain of HD knock-in mice (data not shown), which was somewhat expected from the literature (PMID: 31937940). We believe some technical hurdles can be overcome by developing efficient delivery methods. Nonetheless, it will be an important follow-up study to perform preclinical studies employing optimized base editing strategies and efficient brain delivery methods to fully demonstrate the therapeutic potential of BE strategies. 

      Reviewer #2 (Public Review):

      Summary: 

      In a proof-of-concept study with the aspiration of developing a treatment to delay HD onset, Choi et al. design and test an A>G DNA base editing strategy to exploit the recently established inverse relationship between the number of uninterrupted CAG repeats in polyglutamine repeat expansions and the age-of-onset of Huntington's Disease (HD). Most of the study is devoted to optimizing a base editing strategy typified by BE4max and gRNA2. The base editing is performed in human HEK293 cells engineered with a 51 CAG canonical repeat and in HD knock-in mice harboring 105+ CAG repeats. 

      Correction of factual errors

      We tested base editing strategies aimed at C > T conversion, not A > G DNA base editing. In addition to HEK293 and knock-in mice, we tested base editing strategies in patient-derived iPSC and neurons.

      Weaknesses: 

      Genotypic data on DNA editing are not portrayed in a clear manner consistent with the study's goal, namely reducing the number of uninterrupted CAG repeats by a clinically relevant amount according to the authors' least square approximated mean age-at-onset. No phenotypic data are presented to show that editing performed in either model would lead to reduced hallmarks of HD onset. 

      More evidence is needed to support the central claims and therapeutic potential needs to be more adequate. 

      Our strategies for converting CAG to CAA in model systems resulted in quantitative DNA modification in a population of cells. Consequently, individual cells may carry different genotypes, some harboring CAA and others CAG at the same genomic location. Therefore, using a standard genotype format for DNA to present base editing outcomes may not be ideal. Instead, we presented the resulting genotype data in a quantitative fashion to provide the percentage of conversion at each site. This approach allows for an intuitive interpretation of both the extent of repeat length reduction and the proportion of such modifications.

      Currently, genetically precise HD mouse models with robust motor and behavioral phenotypes are unavailable. While some HD mouse models, such as the BAC and YAC models, feature pronounced behavioral phenotypes, they consist of interrupted CAG repeat sequences, making them unsuitable for base conversion studies due to their inherently short uninterrupted repeats. Although genetically precise HD knockin mouse models exist, they do not manifest motor symptom-like phenotypes. Given that CAG repeat expansion is the primary driver of the disease and knock-in mice recapitulate such phenomenon, our genetic investigation focused on assessing the effects of base conversion on CAG repeat instability in knock-in mice. However, as emphasized by the reviewer, subsequent preclinical studies to evaluate the therapeutic efficacy of CAG-to-CAA conversion strategies using mouse models harboring uninterrupted adult-onset CAG repeats and robust HD-like phenotypes remain crucial.

      Reviewer #3 (Public Review):

      Summary: 

      In human patients with Huntington's disease (HD), caused by a CAG repeat expansion mutation, the number of uninterrupted CAG repeats at the genomic level influences age-at-onset of clinical signs independent of the number of polyglutamine repeats at the protein level. In most patients, the CAG repeat terminates with a CAACAG doublet. However, CAG repeat variants exist that either do not have that doublet or have two doublets. These variants consequently differ in their number of uninterrupted CAG repeats, while the number of glutamine repeats is the same as both CAA and CAG codes for glutamine. The authors first confirm that a shorter uninterrupted CAG repeat number in human HD patients is associated with developing the first clinical signs of HD later. They predict that introducing a further CAA-CAG doublet will result in years of delay of clinical onset. Based on this observation, the authors tested the hypothesis that turning CAG to CAA within a CAG repeat sequence using base editing techniques will benefit HD biology. They show that, indeed, in HD cell models (HEK293 cells expressing 16/17 CAG repeats; a single human stem cell line carrying a CAG repeat expansion in the fully penetrant range with 42 CAG repeats), their base editing strategies do induce the desired CAG-CAA conversion. The efficiency of conversion differed depending on the strategy used. In stem cells, delivery posed a problem, so to test allele specificity, the authors then used a HEK 293 cell line with 51 CAG repeats on the expanded allele. Conversion occurred in both alleles with huntingtin protein and mRNA levels; transcriptomics data was unchanged. In knock-in mice carrying 110 CAG repeats, however, base editing did not work as well for different, mainly technical, reasons. 

      Correction of factual errors

      "HD cell models (HEK293 cells expressing 16/17 CAG repeats" is an incorrect description. It should be "HD cell models (HEK293 cells expressing 51/17 CAG repeats".

      Strengths: 

      The authors use state-of-the-art methods and carefully and thoroughly designed experiments. The data support the conclusions drawn. This work is a very valuable translation from the insight gained from large GWAS studies into HD pathogenesis. It rightly emphasises the potential this has as a causal treatment in HD, while the authors also acknowledge important limitations. 

      Weaknesses: 

      They could dedicate a little more to discussing several of the mentioned challenges. The reader will better understand where base editing is in HD currently and what needs to be done before it can be considered a treatment option. For instance, 

      - It is important to clarify what can be gained by examining again the relationship between uninterrupted CAG repeat length and age-at-onset. Could the authors clarify why they do this and what it adds to their already published GWAS findings? What is the n of datasets? 

      Published HD GWAS (PMID: 31398342) compared the onset age of duplicated interruption and loss of interruption to that of canonical repeats to determine whether uninterrupted CAG repeat or polyglutamine determines age at onset. However, GWAS findings did not quantify the magnitude of the unexplained remaining variance in age at onset in duplicated interruption and loss of interruption. Our study further investigated to gain insights into the amount of additional impact of duplicated interruption to estimate the maximum clinical benefits of base editing strategies for CAG-to-CAA conversion. Since the purpose of this genetic analysis is described in the result section already, we added the following sentence in the introduction section to bring up what is unknown. 

      "Still, age at onset of loss of interruption and duplicated interruption was not fully accounted for by uninterrupted CAG repeat, suggesting additional effects of non-canonical repeats."

      We added sample size for the least square approximation analysis in the text and corresponding figure legend. Sample sizes for molecular and animal experiments can be found in the corresponding figure legend.

      - What do they think an ideal conversion rate would be, and how that could be achieved? 

      It is a very important question. However, speculating the ideal conversion levels is out of the scope of this genetic investigation. A series of preclinical studies using relevant models may generate data that may shed light on the conversion rate levels that are required to produce meaningful clinical benefits. In the discussion section, we added the following sentence. 

      "Currently, the ideal levels of CAG-to-CAA conversion that produce significant clinical benefits are unknown. A series of preclinical studies using relevant model systems may generate data that may shed light on the optimal conversion rate levels that are required to produce significant clinical benefits."

      - Is there a dose-effect relationship for base editing, and would it be realistic to achieve the ideal conversion rate in target cells, given the difficulties described by the authors in differentiated neurons from stem cells? 

      We observed a clear dose-response relationship between the amount of BE reagents and the levels of conversion in non-neuronal cells. Unfortunately, the conversion rate was low in neuronal cells, potentially due to limited delivery, as speculated in the result section. As described in the discussion sections, we predict that efficient delivery methods will be crucial to produce significant CAG-to-CAA conversion to achieve therapeutic benefits.

      - The liver is a good tool for in-vivo experiments examining repeat instability in mouse models. However, the authors could comment on why they did not examine the brain.

      We focused on liver instability because of 1) the expectation that delivery/targeting efficiency is significantly lower in the brain (PMID: 31937940) and 2) shared underlying mechanisms between the brain and liver (described in the result section). The following sentence was added in the method section to provide a rationale for liver analysis. 

      "Since significantly lower delivery/targeting efficiency was expected in the brain 34, we focused on analyzing liver instability."

      - Is there a limit to judging the effects of base editing on somatic instability with longer repeats, given the difficulties in measuring long CAG repeat expansions? 

      Determining the levels of base conversion using sequencing technologies gets harder as repeats become longer. Fragment analysis can overcome such technical difficulty if conversion efficiency is high. As pointed out, the repeat expansion measure is also challenging because amplification is biased toward shorter alleles. However, if repeat sizes are relatively similar, the levels of repeat expansion as a function of base conversion can be determined relatively precisely without a significant bias by a standard fragment analysis approach. 

      - Given the methodological challenges for assessing HTT fragments, are there other ways to measure the downstream effects of base editing rather than extrapolate what it will likely be?

      Our CAG-to-CAA conversion strategies are not expected to directly generate fragments of huntingtin DNA, RNA, or protein. In contrast, immediate downstream effects of CAG-to-CAA conversion include sequence changes (DNA and RNA) and alteration of repeat instability, which are presented in the manuscript. If repeat instability is associated with HTT exon 1A fragment, base conversion strategies may indirectly alter the levels of such putative toxic species, which remains to be determined.  

      - Sequencing errors could mask low-level, but biologically still relevant, off-target effects (such as gRNAdependent and gRNA-independent DNA, Off-targets, RNA off-targets, bystander editing). How likely is that? 

      We agree with the reviewer that increased editing efficiency is expected to increase the levels of off-target editing. However, the field is actively developing base editors with minimal off-target effect (PMID: 35941130), which will increase the safety aspects of this technology for clinical use. We added the following sentence.  "In addition, developing base editors with high level on-target gene specificity and minimal off-target effects is a critical aspect to address 100."

      - How worried are the authors about immune responses following base editing? How could this be assessed? 

      We added the following sentence in the discussion section as the reviewer raised an important safety issue.  

      "Thorough assessments of immune responses against base editing strategies (e.g., development of antibody, B cell, and T cell-specific immune responses) and subsequent modification (e.g., immunosilencing) 101 will be critical to address immune response-associated safety issues of BE strategies."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following points could be considered to improve the overall quality of the manuscript: 

      (1) The authors mentioned that the reason for checking repeat instability in the nonneuronal cells was due to the availability of specific types of AAV; there are other subtypes of AAVs available to infect neurons and iPSCs. 

      Our pilot experiments testing several AAV serotypes in patient-derived iPSC and HD knock-in mice showed that only AAV9 converted CAG to CAA at detectable levels in the liver, not in the brain or neurons. We also speculate that difficulties in targeting the CAG repeat region due to GC-rich sequence contributed to low conversion efficiency. Therefore, subsequent optimization of base editor and delivery may improve BE strategies for HD, permitting robust conversion at the challenging locus. 

      (2) Despite its bold nature, minimal data in the manuscript demonstrate that this gene editing strategy is disease-modifying.

      Resources required to demonstrate the therapeutic benefits of CAG-to-CAA conversion strategies are not fully available. Especially, relevant HD mouse models that carry uninterrupted adult onset CAG repeat and that permit measuring the levels of disease-modifying are lacking, as described in our response to the second reviewer. Given that CAG repeat expansion is the primary driver of the disease, this genetic investigation focused on determining the impacts of base editing strategies on CAG repeat expansion. Still, as indicated by the reviewer, follow-up preclinical studies to evaluate the levels of disease-modifying of CAG-to-CAA conversion strategies using relevant mouse models represent important next steps.

      (3) Off-target analysis at the DNA level was limited to "predicted" off-target sites. What about possible translocations that can result from co-nicking on different chromosomes, as a large number of potential targets exist? 

      Among gRNAs we tested, we focused on gRNAs 1 and 2, which predicted small numbers of off-target. Therefore, our off-target analysis at the DNA level was focused on validating those predicted off-targets. As pointed out, thoroughly evaluating off-target effects will be necessary when candidate BE strategies take the next steps for therapeutic development.

      Genomic translocation caused by double-strand breaks can produce negative consequences, such as cancer. Importantly, although paired nicks efficiently induced translocations, translocations were not detected when a single nick was introduced on each chromosome (PMID: 25201414). Therefore, it is predicted that BE strategies using nickase confers little risk of translocation.

      (4) For in vivo work, somatic repeat expansion was analyzed only in peripheral tissue samples. Since the main affected cellular population in HD is the brain, the outcome of this treatment on a disease-relevant organ still needs to be determined. 

      Challenges in delivery to the brain made us determine instability in the liver since many mechanistic components of somatic CAG repeat instability are shared between the liver and striatum, as rationalized in the manuscript. However, we agree with the reviewer regarding the importance of determining the effects of base conversion on brain instability. We added the following sentence in the method section to provide a rationale. "Since significantly lower delivery/targeting efficiency was expected in brain 34, we focused on analyzing liver instability."

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, the authors apologize for techniques that do not work when workarounds seem readily apparent to an expert in the field. In its current form, the manuscript reads verbose, speculative, apologetic, and preliminary. 

      Drug development programs that are supported by human genetics data show increased success rates in clinical trials (PMID: 26121088, 31827124, 31830040). This is why this genetic study focused on 1) investigating observations in HD subjects and 2) subsequently developing treatment strategies that are supported by patient genetics. As the first illustration of base editing in HD, the main scope of our manuscript is to justify the genetic rationale of CAG-to-CAA conversion and demonstrate the feasibility of therapeutic strategies rooted in patient genetics. As our study was not aimed at entirely demonstrating the clinical benefits of base editing strategies in HD, some of our data were based on tools and approaches that were not fully optimal. We agree with the reviewer that it will be an important next step to employ optimized approaches to evaluate the efficacy of base editing strategies in model systems. Nevertheless, our novel base conversion strategies derived from HD patient genetics represent a significant advancement as they may contribute to developing effective treatments for this devastating disorder. 

      Reviewer#3 (Recommendations For The Authors):

      It would make for an easier read if abbreviations were kept to a minimum. 

      As recommended, we decreased the use of abbreviations. The following has been spelled out throughout the manuscript: CR (canonical repeat), LI (loss of interruption), DI (duplicated interruption), and CBE (cytosine base editor). Other abbreviations with infrequent usage (e.g., ABE, SS, QC) were also spelled out in the text.

    1. Author response:

      Reviewer #1: 

      Summary:

      In this study, the authors used a multi-alternative decision task and a multidimensional signal-detection model to gain further insight into the cause of perceptual impairments during the attentional blink. The model-based analyses of behavioural and EEG data show that such perceptual failures can be unpacked into distinct deficits in visual detection and discrimination, with visual detection being linked to the amplitude of late ERP components (N2P and P3) and discrimination being linked to the coherence of fronto-parietal brain activity.

      Strengths:

      The main strength of this paper lies in the fact that it presents a novel perspective on the cause of perceptual failures during the attentional blink. The multidimensional signaldetection modelling approach is explained clearly, and the results of the study show that this approach offers a powerful method to unpack behavioural and EEG data into distinct processes of detection and discrimination.

      Weaknesses:

      (1.1) While the model-based analyses are compelling, the paper also features some analyses that seem misguided, or, at least, insufficiently motivated and explained. Specifically, in the introduction, the authors raise the suggestion that the attentional blink could be due to a reduction in sensitivity or a response bias. The suggestion that a response bias could play a role seems misguided, as any response bias would be expected to be constant across lags, while the attentional blink effect is only observed at short lags. Thus, it is difficult to understand why the authors would think that a response bias could explain the attentional blink.

      A deficit in T2 identification accuracy could arise from either sensitivity or criterion effects; the criterion effect may manifest as a choice bias. For example, in short T1-T2 lag trials, when T2 closely follows T1, participants may adopt a more conservative choice criterion for reporting the presence of T2. Moreover, criterion effects need not be uniform across lags: A participant could infer the T1-T2 lag interval based on various factors, including trial length, thereby permitting them to adjust their choice criterion variably across different lags. We will provide a more detailed illustration of this claim in the revision.

      (1.2) A second point of concern regards the way in which the measures for detection and discrimination accuracy were computed. If I understand the paper correctly, a correct detection was defined as either correctly identifying T2 (i.e., reporting CW or CCW if T2 was CW or CCW, respectively, see Figure 2B), or correctly reporting T2's absence (a correct rejection). Here, it seems that one should also count a misidentification (i.e., incorrect choice of CW or CCW when T2 was present) as a correct detection, because participants apparently did detect T2, but failed to judge/remember its orientation properly in case of a misidentification. Conversely, the manner in which discrimination performance is computed also raises questions. Here, the authors appear to compute accuracy as the average proportion of T2-present trials on which participants selected the correct response option for T2, thus including trials in which participants missed T2 entirely. Thus, a failure to detect T2 is now counted as a failure to discriminate T2. Wouldn't a more proper measure of discrimination accuracy be to compute the proportion of correct discriminations for trials in which participants detected T2?

      Detection and discrimination accuracies were computed with precisely the same procedure, and under the same conditions, as described by the Reviewer (underlined text, above). We regret our poor description; we will improve upon it in the revised manuscript.

      (1.3) My last point of critique is that the paper offers little if any guidance on how the inferred distinction between detection and discrimination can be linked to existing theories of the attentional blink. The discussion mostly focuses on comparisons to previous EEG studies, but it would be interesting to know how the authors connect their findings to extant, mechanistic accounts of the attentional blink. A key question here is whether the finding of dissociable processes of detection and discrimination would also hold with more meaningful stimuli in an identification task (e.g., the canonical AB task of identifying two letters shown amongst digits). There is evidence to suggest that meaningful stimuli are categorized just as quickly as they are detected (Grill-Spector & Kanwisher, 2005; Grill-Spector K, Kanwisher N. Visual recognition: as soon as you know it is there, you know what it is. Psychol Sci. 2005 Feb;16(2):152-60. doi: 10.1111/j.0956-7976.2005.00796.x. PMID: 15686582.). Does that mean that the observed distinction between detection and discrimination would only apply to tasks in which the targets consist of otherwise meaningless visual elements, such as lines of different orientations?

      Our results are consistent with previous literature suggested by the Reviewer. Specifically, we do not claim that detection and discrimination are sequential processes; in fact, we modeled them as concurrent computations (Figs. 3A-B). Yet, our results suggest that these processes possess distinct neural bases. We have discussed this idea briefly in the Discussion section (e.g., “Yet, we found no evidence for these two computations being sequential…”). We will discuss this further in the revised manuscript in the context of previous literature.

      Reviewer #2:

      Summary:

      The authors had two aims: First, to decompose the attentional blink (AB) deficit into the two components of signal detection theory; sensitivity and bias. Second, the authors aimed to assess the two subcomponents of sensitivity; detection and discrimination. They observed that the AB is only expressed in sensitivity. Furthermore, detection and discrimination were doubly dissociated. Detection modulated N2p and P3 ERP amplitude, but not frontoparietal beta-band coherence, whereas this pattern was reversed for discrimination.

      Strengths:

      The experiment is elegantly designed, and the data - both behavioral and electrophysiological - are aptly analyzed. The outcomes, in particular the dissociation between detection and discrimination blinks, are consistently and clearly supported by the results. The discussion of the results is also appropriately balanced.

      Weaknesses:

      (2.1) The lack of an effect of stimulus contrast does not seem very surprising from what we know of the nature of AB already. Low-level perceptual factors are not thought to cause AB. This is fine, as there are also other, novel findings reported, but perhaps the authors could bolster the importance of these (null) findings by referring to AB-specific papers, if there are indeed any, that would have predicted different outcomes in this regard.

      While there is consensus that the low-level perceptual factors are not affected by the attentional blink, other studies may suggest evidence to the contrary (e.g., Chua et al, Percept. Psychophys., 2005). We will highlight the significance of our findings in the context of such conflicting evidence in literature, in the revised manuscript.

      (2.2) On an analytical note, the ERP analysis could be finetuned a little more. The task design does not allow measurement of the N2pc or N400 components, which are also relevant to the AB, but the N1 component could additionally be analyzed. In doing so, I would furthermore recommend selecting more lateral electrode sites for both the N1, as well as the P1. Both P1 and N1 are likely not maximal near the midline, where the authors currently focused their P1 analysis.

      We will incorporate these additional analyses in the revised manuscript.

      (2.3) Impact & Context:

      The results of this study will likely influence how we think about selective attention in the context of the AB phenomenon. However, I think its impact could be further improved by extending its theoretical framing. In particular, there has been some recent work on the nature of the AB deficit, showing that it can be discrete (all-or-none) and gradual (Sy et al., 2021; Karabay et al., 2022, both in JEP: General). These different faces of target awareness in the AB may be linked directly to the detection and discrimination subcomponents that are analyzed in the present paper. I would encourage the authors to discuss this potential link and comment on the bearing of the present work on these behavioural findings.

      Thank you. We will discuss our findings in the context of these recent studies.

      Reviewer #3:

      Summary:

      In the present study, the authors aimed to achieve a better understanding of the mechanisms underlying the attentional blink, that is, a deficit in processing the second of two target stimuli when they appear in rapid succession. Specifically, they used a concurrent detection and identification task in- and outside of the attentional blink and decoupled effects of perceptual sensitivity and response bias using a novel signal detection model. They conclude that the attentional blink selectively impairs perceptual sensitivity but not response bias, and link established EEG markers of the attentional blink to deficits in stimulus detection (N2p, P3) and discrimination (fronto-parietal high-beta coherence), respectively. Taken together, their study suggests distinct mechanisms mediating detection and discrimination deficits in the attentional blink.

      Strengths:

      Major strengths of the present study include its innovative approach to investigating the mechanisms underlying the attentional blink, an elegant, carefully calibrated experimental paradigm, a novel signal detection model, and multifaceted data analyses using state-of-theart model comparisons and robust statistical tests. The study appears to have been carefully conducted and the overall conclusions seem warranted given the results. In my opinion, the manuscript is a valuable contribution to the current literature on the attentional blink. Moreover, the novel paradigm and signal detection model are likely to stimulate future research.

      Weaknesses:

      Weaknesses of the present manuscript mainly concern the negligence of some relevant literature, unclear hypotheses, potentially data-driven analyses, relatively low statistical power, potential flaws in the EEG methods, and the absence of a discussion of limitations. In the following, I will list some major and minor concerns in detail.

      Major points

      (3.1) Hypotheses:

      I appreciate the multifaceted, in-depth analysis of the given dataset including its high amount of different statistical tests. However, neither the Introduction nor the Methods contain specific statistical hypotheses. Moreover, many of the tests (e.g., correlations) rely on selected results of previous tests. It is unclear how many of the tests were planned a priori, how many more were performed, and how exactly corrections for multiple tests were implemented. Thus, I find it difficult to assess the robustness of the results.

      As outlined in the Introduction, we hypothesized that neural computations associated with target detection would be characterized by regional neuronal markers (e.g., parietal or occipital ERPs), whereas computations linked to feature discrimination may involve neural coordination across multiple brain regions (e.g. fronto-parietal coherence). We planned and conducted our statistical tests based on this hypothesis. All multiple comparison corrections (e.g., Bonferroni-Holm correction, see Methods) were performed separately for each class of analyses. We will clarify these hypotheses and provide further details in the revised manuscript.

      (3.2) Power:

      Some important null findings may result from the rather small sample sizes of N = 24 for behavioral and N = 18 for ERP analyses. For example, the correlation between detection and discrimination d' deficits across participants (r=0.39, p=0.059) (p. 12, l. 263) and the attentional blink effect on the P1 component (p=0.050, no test statistic) (p. 14, 301) could each have been significant with one more participant. In my opinion, such results should not be interpreted as evidence for the absence of effects.

      We agree and will revise the manuscript accordingly. We will also report Bayes factor (BF) values, where relevant, to further evaluate these claims.

      (3.3) Neural basis of the attentional blink:

      The introduction (e.g., p. 4, l. 56-76) and discussion (e.g., p. 19, 427-447) do not incorporate the insights from the highly relevant recent review by Zivony & Lamy (2022), which is only cited once (p. 19, l. 428). Moreover, the sections do not mention some relevant ERP studies of the attentional blink (e.g., Batterink et al., 2012; Craston et al., 2009; Dell'Acqua et al., 2015; Dellert et al., 2022; Eiserbeck et al., 2022; Meijs et al., 2018).

      We will motivate and discuss our study in the context of these previous studies. 

      (3.4) Detection versus discrimination:

      Concerning the neural basis of detection versus discrimination (e.g., p. 6, l. 98-110; p. 18, l. 399-412), relevant existing literature (e.g., Broadbent & Broadbent, 1987; Hillis & Brainard, 2007; Koivisto et al., 2017; Straube & Fahle, 2011; Wiens et al., 2023) is not included.

      Thank you for these suggestions. We will include these important studies in our discussion.

      (3.5) Pooling of lags and lags 1 sparing:

      I wonder why the authors chose to include 5 different lags when they later pooled early (100, 300 ms) and late (700, 900 ms) lags, and whether this pooling is justified. This is important because T2 at lag 1 (100 ms) is typically "spared" (high accuracy) while T2 at lag 3 (300 ms) shows the maximum AB (for reviews, see, e.g., Dux & Marois, 2009; Martens & Wyble, 2010). Interestingly, this sparing was not observed here (p. 43, Figure 2). Nevertheless, considering the literature and the research questions at hand, it is questionable whether lag 1 and 3 should be pooled.

      Lag-1 sparing is not always observed in attentional blink studies; there are notable exceptions that do not report such sparing (Hommel et al., Q. J. Exp. Psychol., 2005; Livesay et al., Attention, Percept. Psychophys., 2011). Our statistical tests revealed no significant difference in accuracies between short lag (100 and 300 ms) trials or between long lag (700 and 900 ms) trials but did reveal significant differences between the short and long lag trials (ANOVA, followed by post-hoc tests). To simplify the presentation of the findings, we pooled together the short lag (100 and 300 ms) and, separately, the long lag (700 and 900 ms) trials. We will present these analyses, and clarify the motivation for pooling in the revised manuscript. 

      (3.6) Discrimination in the attentional blink

      Concerning the claims that previous attentional blink studies conflated detection and discrimination (p. 6, l. 111-114; p. 18, l. 416), there is a recent ERP study (Dellert et al., 2022) in which participants did not perform a discrimination task for the T2 stimuli. Moreover, since the relevance of all stimuli except T1 was uncertain in this study, irrelevant distractors could not be filtered out (cf. p. 19, l. 437). Under these conditions, the attentional blink was still associated with reduced negativities in the N2 range (cf. p. 19, l. 427-437) but not with a reduced P3 (cf. p. 19, l 439-447).

      We will address the difference between our findings and those of Dellert et al (2022) in the revised manuscript.

      (3.7) General EEG methods:

      While most of the description of the EEG preprocessing and analysis (p. 31/32) is appropriate, it also lacks some important information (see, e.g., Keil et al., 2014). For example, it does not include the length of the segments, the type and proportion of artifacts rejected, the number of trials used for averaging in each condition, specific hypotheses, and the test statistics (in addition to p-values).

      We regret the oversight. We will include these details in the revised Methods.

      (3.8) EEG filters:

      P. 31, l. 728: "The data were (...) bandpass filtered between 0.5 to 18 Hz (...). Next, a bandstop filter from 9-11 Hz was applied to remove the 10 Hz oscillations evoked by the RSVP presentation." These filter settings do not follow common recommendations and could potentially induce filter distortions (e.g., Luck, 2014; Zhang et al., 2024). For example, the 0.5 high-pass filter could distort the slow P3 wave. Mostly, I am concerned about the bandstop filter. Since the authors commendably corrected for RSVP-evoked responses by subtracting T2-absent from T2-present ERPs (p. 31, l. 746), I wonder why the additional filter was necessary, and whether it might have removed relevant peaks in the ERPs of interest.

      Thank you for this suggestion. We will repeat this analysis by removing these additional filters.

      (3.9) Coherence analysis:

      P. 33, l. 786: "For subsequent, partial correlation analyses of coherence with behavioral metrics and neural distances (...), we focused on a 300 ms time period (0-300 ms following T2 onset) and high-beta frequency band (20-30 Hz) identified by the cluster-based permutation test (Fig. 5A-C)." I wonder whether there were any a priori criteria for the definition and selection of such successive analyses. Given the many factors (frequency bands, hemispheres) in the analyses and the particular shape of the cluster (p. 49, Fig 5C), this focus seems largely data-driven. It remains unclear how many such tests were performed and whether the results (e.g., the resulting weak correlation of r = 0.22 in one frequency band and one hemisphere in one part of a complexly shaped cluster; p. 15, l. 327) can be considered robust.

      Please see responses to comments #3.1 and #3.2 (above). In addition to reporting further details regarding statistical tests and multiple comparisons corrections, we will compute and report Bayes factors to quantify the strength of the evidence for correlations, as appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The current manuscript provides an extensive in vivo analysis of two guidance pathways identifying multiple mechanisms that shape the bifurcation of DRG axons when forming the dorsal funiculus in the DREZ. 

      Strengths: 

      Multiple mouse mutant lines were used, together with complementary techniques; the results are very clear and compelling. 

      The findings are very significant and clearly move forward our understanding of the regulation of axonal development at the DREZ. 

      Weaknesses: 

      No major weaknesses were found. As it is I have no recommendations that would increase the clarity or quality of the manuscript. 

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors conduct a detailed analysis of the molecular cues that control the guidance of bifurcated dorsal root ganglion axons in a key region of the spinal cord called the dorsal funiculus. This is a specific case of axon guidance that occurs in a precise way. The authors knew that Slit was important but many axons still target correctly in Slit knockouts, suggesting a role for other guidance factors. Netrin1 is also expressed in this region, so they looked at netrin mutants. The authors found axons outside the DREZ in the Ntn1 mutants, and they show by single-neuron genetic labeling that many of these come from DRG neurons. Quantified axonal tracing studies in Slit1/2, Ntn1, or triple mutant embryos support the idea that Slit and Ntr1 have distinct functions in guidance and that the effect of their loss is additive. Interestingly none of these knockouts affect bifurcation itself but rather the guidance of one or both of the bifurcated axon terminals. Knockout of the Slit receptors (Robo1/2) or the Netrin 1 receptor (DCC) in embryos causes similar guidance defects to loss of the ligands, providing additional confirmation of the requirement for both guidance pathways. 

      Strengths: 

      This study expands understanding of the role of the axon guidance factors Ntr1/DCC and Slit/Robo in a specific axon guidance decision. The strength of the study is the careful axonal labeling and quantification, which allows the authors to establish precise consequences of the loss of each guidance factor or receptor. 

      Weaknesses: 

      There are some places in the text where the discussion of these data is compared with other studies and models, but additional details would help clarify the arguments. 

      The details were added to the first section of Discussion in the revision to address this weakness.  Also see the response to the recommendations below.

      Reviewer #3 (Public Review):

      Summary: 

      In this paper, Curran et al investigate the role of Ntn, Slit1, and Slit 2 in the axon patterning of DRG neurons. The paper uses mouse genetics to perturb each guidance molecule and its corresponding receptor. Cre-based approaches and immunostaining of DRG neurons are used to assess the phenotypes. Overall, the study uses the strength of mouse genetics and imaging to reveal new genetic modifiers of DRG axons. The conclusions of the experiments match the presented results. The paper is an important contribution to the field, as evidence that dorsal funiculus formation is impacted by Ntn and Slit signaling. However, there are some potential areas of the manuscript that should be edited to better match the results with the conclusions of the work. 

      Strengths: 

      The manuscript uses the advantage of mouse genetics to investigate the axon patterning of DRG neurons. The work does a great job of assessing individual phenotypes in single and double mutants. This reveals an intriguing cooperative and independent function of Ntn, Slit1, and Slit2 in DRG axon patterning. The sophisticated triple mutant analysis is lauded and provides important insight. 

      Weaknesses: 

      Overall, the manuscript is sound in technique and analysis. However, the majority of the manuscript is about the dorsal funiculus and not the bifurcation of the axons, as the title would make a reader believe. Further, the manuscript would provide a more scholarly discussion of the current knowledge of DRG axon patterning and how their work fits into that knowledge. 

      We revised the title as suggested.  Additional discussion of DRG axon growth at the DREZ is added to the last section of the Discussion in the revision.  Also see the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Given the reasons stated above, I have no specific recommendations for the authors. 

      There is a typo in the Abstract (... mice with triple deletion of Ntn1, Slit2, and Slit2....). 

      Corrected in the revision.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors twice repeated that their data on DRG guidance defects in the Ntn1 mutants differ from studies previously published in references 19 and 26. However it is unclear to me, without having read those other studies, what is actually different between this study and those, and why there would be differences between the results from two groups. If the authors think this is an important point to make they need to more clearly say what the other group saw and offer an explanation of why the data may be different. 

      We added detailed comparison of the defects from different studies to the first section of the Discussion and suggested multiple roles of Ntn1 in controlling sensory axon growth at the DREZ in the revision.

      (2) In the final section of the discussion it says, "The guidance regulation of DRG axon bifurcation by Slit and Ntn1 may be similar to but overshadowed by their function in midline guidance [43]." The meaning of this sentence was unclear to me. I had been thinking that since there are total knockout embryos (not conditional) there could be patterning effects that happen before the DRG branching that influence the formation of the DREZ. Is this what the authors mean to say here? How can the authors show that the guidance factors they have knocked out are actually functioning in the DRG neurons? 

      We agree with the reviewer that the first sentence is vague, so we edited the paragraph and included the discussion of the regulation of DRG axons at the DREZ, which was the main theme of this last section.  In addition, we agree with the reviewer’s suggestion of the possible indirect role of Ntn1 on DRG axons via the control of interneuron migration.  This possibility was included in the last paragraph of the Discussion.

      (3) In several of the figures (3T, 5I, 5J) there are distance measurements that are presumably averages of multiple axons in 3 or 4 embryos because 3-4 points are shown per graph. However, the figure and methods do not say how many axons were measured per embryo and I could not find if it says these numbers are averages. Clarifying the details of these panels would be useful. 

      The n is the number of animals analyzed and is now added to the figure legends.  From each animal, multiple sections (2-4) were analyzed for various parameters in Fig. 3 and 5.  This information was added to the Method section of the revision.

      Reviewer #3 (Recommendations For The Authors):

      Overall the data matches the conclusions in the paper. However, to this reviewer, the title suggests that Ntn and Slit will have defects in bifurcation. This is not the presented phenotype. I recommend the authors change the title to better reflect the findings of the work. 

      We edited the title of the revised manuscript to reflect the control of growth direction in the context of bifurcation.  

      The introduction of the work clearly outlines what is known about DREZ formation in mice but could extend its discussion to other systems like chick and zebrafish (Jaeda Coutinho-Budd et al. 2008, Wang and Scott 2000, Golding et al 1997, Nichols and Smith 2019, Kikel-Coury et al 2021). These studies are particularly important given that pioneer events, including bifurcation, can be visualized. Acknowledging the contribution of other model systems to the understanding of DRG axon patterning is important to improve the scholarly discussion of the paper. 

      We added more detailed discussion of the current knowledge of DRG axon growth at the DREZ from several relevant studies of the rodent and zebrafish models in the last section of Discussion.

      In the data presented, the authors see defects in the axon patterning of DRG neurons and conclude it is a defect in the dorsal funiculus formation. Another interpretation is that a subset of axons cannot invade the spinal cord boundary properly. This phenotype was observed in zebrafish with timelapse imaging (Kikel-Coury et al 2021). It may not be necessary to specifically test the axons' ability to enter the spinal cord in this paper, but the possibility that this could drive the presented phenotypes should be more clearly stated in the results. Entry is not thoroughly addressed in this paper and would need to be confirmed by labeling the edge of the spinal cord with a second reporter. No entry would obviously impact axon targeting. However, delayed entry could place the axon in a navigation environment that is atypical, causing it to navigate aberrantly and present as a funiculus phenotype. 

      We thank the reviewer for raising this very interesting point.  In our present view, dorsal funiculus formation is related to DRG axon patterning, which involves growth, guidance, and bifurcation of the incoming afferents at the dorsal spinal cord.  We believe that these events are highly coordinated by various environmental cues to generate the DREZ and the dorsal funiculus.  The defects we observed could result from the disruption of such coordination that leads to misregulation of DRG axon entry at the dorsal spinal cord, as suggested by the reviewer.  We propose that further analysis by time-lapse imaging as done in zebrafish would provide better understanding of such coordination.  This discussion was included in the last section of Discussion. 

      The authors should clarify that their approach does not knock out molecules in a cell-specific way. This would specifically impact the interpretation of the Dcc phenotypes. It is possible that UNC-40/DCC is guiding cells that are not labeled. The non-autonomous role of UNC-40/DCC should be clearly stated as a possibility. 

      This discussion was added to the last paragraph of the Discussion section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful to all reviewers and to you for your careful analysis of our work and for the feedback you all provided. The reviews were fundamentally positive with very minor modifications suggested, which we have addressed in this new version as follows.

      (1) We changed Figure 1 to include a high resolution image of the 3D structure of the low affinity complex between the RBD and the GM1 tetrasaccharide (GM1os), see panel d. We predicted this structure through extensive sampling through MD simulations as part of earlier work aimed at guiding the resolution of a crystal structure. Due to insurmountable difficulties in the crystallization of such complex the work was only published as an extended abstract(Garozzo, Nicotra, and Sonnino 2022). Following one of the reviewer’s suggestions we added all the details on the computational approach we used as Supplementary Material.

      (2) We added the comment and corresponding references to the Discussion section in relation to earlier work flagged by one of the Reviewers (Rochman et al. 2022) “Further to this, our results show that taking into consideration the effects on _N-_glycosylation on protein structural stability and dynamics in the context of specific protein sequences may be key to understanding epistatic interactions among RBD residues, which would be otherwise very difficult, where not impossible, to decipher.”

      References

      Garozzo, Domenico, Francesco Nicotra, and Sandro Sonnino. 2022. “‘Glycans and Glycosylation in SARS-COV2 Infection’ Session at the XVII Advanced School in Carbohydrate Chemistry, Italian Chemical Society. July 4th -7th 2021, Pontignano (Si), Italy.” Glycoconjugate Journal 39 (3): 327–34.

      Rochman, Nash D., Guilhem Faure, Yuri I. Wolf, Peter L. Freddolino, Feng Zhang, and Eugene V. Koonin. 2022. “Epistasis at the SARS-CoV-2 Receptor-Binding Domain Interface and the Propitiously Boring Implications for Vaccine Escape.” MBio 13 (2): e0013522.

    1. Author response:

      eLife assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the thorough review and recognize both the strengths and weaknesses highlighted.

      We concur with the reviewer’s assessment of the novelty of our approach, particularly in specifically perturbing the activity of CF in the flocculus and examining the effects during different phases of learning. Also the usage of OKR behavior paradigm adds strength to our study by providing a well-established model for investigating cerebellar learning processes.

      Regarding concerns about the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we are committed to addressing these issues through additional experiments. Specifically, we aim to demonstrate sustained inhibition of CF transmission by verifying the maintenance of inhibition throughout the putative consolidation phase. This may involve monitoring CF activity during the irradiation period in vivo. Furthermore, we plan to provide further characterization of viral targeting to ensure specificity of our approach.  

      Additionally, we recognize the importance of discussing alternative mechanisms of CF involvement in cerebellar learning. Hence, we will expand the manuscript to provide more comprehensive discussion of these dimensions of CF function to provide a clearer understanding of the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

    1. Author Response:

      eLife assessment

      We thank the Editors for identifying qualified reviewers. We agree that the “evidence supporting this claim (that ‘many breast cancer mutations are mildly deleterious’) is incomplete”. Much more detail is needed to state this decisively and we do not claim completeness here. As far as validation, we carried out synthetic testing of the models as suggested by Reviewer #1 and the results seem good.

      Reviewer #1:

      We thank the Reviewer for a very thorough examination of not only the current paper but also our previous paper. We agree that the illustration material can be overwhelming and we plan to use the Reviewer’s advice in that matter. In addition, we originally put some textbook material in the Appendix, and arguably some of it may be considered superfluous.

      Most of the references the Reviewer provides are known to us, although it is likely we should cite and discuss more. All of the above will be included in the revision we are planning.

      The Reviewer is certainly correct that population growth and spatial effects play a major role in cancer. However, the effects of constraining environment are quite strong and the reality lies somewhere between the Moran and branching process models; exactly what we attempt to clarify. As for spatial effects, most tumors extracted in clinic are dissected in bulk and sub-sampling is rare, so the spatial information is rarely accessible.

      The subsequent point of importance concerns the weak specificity of the site frequency spectra (SFS) with respect to the underlying genetic and demographic forces. This cannot be denied. However, we just meant to state that our SFS are consistent with a model involving slightly deleterious passengers.

      Regarding the validation of the estimation procedures which is a point well-taken, we carried out synthetic testing of the models as suggested by Reviewer #1 and the results seem good. This will be discussed in full in the revision.

      In our view, the most important remark is the one concerning scaling of the models. The Reviewer is certainly correct that 100 stem cells are insufficient to drive a realistic tumor. However, what we had in mind but not explained sufficiently, is that a sample of 100 cells corresponds to average-depth coverage in bulk sequencing. Therefore, the strict interpretation is that the model mirrors what is observed in the sample. A more accurate approach would be to up-scale the model and then sample 100 cells from it. The Moran-type model can be up-scaled using diffusion approximation, and we hope to include these computations in the revision. The associated criticism concerning tumor growth seems less relevant, since we experimented with less or more stringent constraints in our models.

      Reviewer #2:

      We thank Reviewer #2 for studying our paper and some very positive comments. Among others, the Reviewer underscores the fact that the Moran-type model generates SFS concordant with the data (with all necessary reservations). The Reviewer concurs with us that conditioning on non-extinction is not very common in the literature, while it should be.

      Similarly as the Reviewer, we are somewhat puzzled by the differences in behavior between models A and B. Model B seems more parsimonious, but Model A looks more similar to the critical or slightly supercritical branching process. We will work to clarify these observations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Self-inhibiting percolation and viral spreading in epithelial tissue" describes a model based on 5-state cellular automata of development of an infection. The model is motivated and qualitatively justified by time-resolved measurements of expression levels of viral, interferon-producing, and antiviral genes. The model is set up in such a way that the crucial difference in outcomes (infection spreading vs. confinement) depends on the initial fraction of special virus-sensing cells. Those cells (denoted as 'type a') cannot be infected and do not support the propagation of infection, but rather inhibit it in a somewhat autocatalytic way. Presumably, such feedback makes the transition between two outcomes very sharp: a minor variation in concentration of ``a' cells results in qualitative change from one outcome to another. As in any percolation-like system, the transition between propagation and inhibition of infection goes through a critical state with all its attributes. A power-law distribution of the cluster size (corresponding to the fraction of infected cells) with a fairly universal exponent and a cutoff at the upper limit of this distribution.

      Strengths:

      The proposed model suggests an explanation for the apparent diversity of outcomes of viral infections such as COVID.

      Author response: We thank the referee for the concise and accurate summary of our work.

      Weaknesses:

      Those are not real points of weakness, though I think addressing them would substantially improve the manuscript.

      Author response: Below we will address these point by point.

      The key point in the manuscript is the reduction of actual biochemical processes to the NOVAa rules. I think more could be said about it, be it referring to a set of well-known connections between expression states of cells and their reaction to infection or justifying it as an educated guess.

      Author response: We have now improved this part in the model section. We have added a few sentences explaining how the cell state transitions are motivated by the UMAP results:

      “The cell state transitions triggered by IFN signaling or viral replication are known in viral infection, but how exactly the transitions are orchestrated for specific infections is poorly understood. The UMAP cell state distribution hints at possible preferred transitions between states. The closer two cell states are on the UMAP, the more likely transitions between them are, all else being equal. For instance, the antiviral state (𝐴) is easily established from a susceptible cell (𝑂), but not from the fully virus-hijacked cell (𝑉 ). The IFN-secreting cell state (𝑁) requires the co-presence of the viral and antiviral genes and thus the cell cluster is located between the antiviral state (𝐴) and virus-infected state (𝑉 ) but distant from the susceptible cells (𝑂).

      Inspired by the UMAP data visualization (Fig. 1a), we propose the following transitions between five main discrete cell states”

      Another aspect where the manuscript could be improved would be to look a little beyond the strange and 'not-so-relevant for a biomedical audience' focus on the percolation critical state. While the presented calculation of the precise percolation threshold and the critical exponent confirm the numerical skills of the authors, the probability that an actual infected tissue is right at the threshold is negligible. So in addition to the critical properties, it would be interesting to learn about the system not exactly at the threshold: For example, how the speed of propagation of infection depends on subcritical p_a and what is the cluster size distribution for supercritical p_a.

      Author response: We agree that further exploring the model away from the critical threshold is worthwhile. While our main focus has been on explaining the large degree of heterogeneity in outcomes – readily explained as a consequence of the sharp threshold-like behavior – we now include plots of the time-evolution of the infection (as well as the remaining states) over time for subcritical values of pa. The plots can be found in Figure S4 of the supplement.

      Reviewer #2 (Public Review):

      Xu et al. introduce a cellular automaton model to investigate the spatiotemporal spreading of viral infection. In this study, the author first analyzes the single-cell RNA sequencing data from experiments and identifies four clusters of cells at 48 hours post-viral infection, including susceptible cells (O), infected cells (V), IFN-secreting cells (N), and antiviral cells (A). Next, a cellular automaton model (NOVAa model) is introduced by assuming the existence of a transient pre-antiviral state (a). The model consists of an LxL lattice; each site represents one cell. The cells change their state following the rules depending on the interaction of neighboring cells. The model introduces a key parameter, p_a, representing the fraction of pre-antiviral state cells. Cell apoptosis is omitted in the model. Model simulations show a threshold-like behavior of the final attack rate of the virus when p_a changes continuously. There is a critical value p_c, so that when p_a < p_c, infections typically spread to the entire system, while at a higher p_a > p_c, the propagation of the infected state is inhibited. Moreover, the radius R that quantifies the diffusion range of N cells may affect the critical value p_c; a larger R yields a smaller value of the critical value p_c. The structure of clusters is different for different values of R; greater R leads to a different microscopic structure with fewer A and N cells in the final state. Compared with the single-cell RNA seq data, which implies a low fraction of IFN-positive cells - around 1.7% - the model simulation suggests R=5. The authors also explored a simplified version of the model, the OVA model, with only three states. The OVA model also has an outbreak size. The OVA model shows dynamics similar to the NOVAa model. However, the change in microstructure as a function of the IFN range R observed in the NOVAa model is not observed in the OVA model.

      Author response: We thank the referee for the comprehensive summary of our work.

      Data and model simulation mainly support the conclusions of this paper, but some weaknesses should be considered or clarified.

      Author response: Thank you - we will address these point by point below.

      (1) In the automaton model, the authors introduce a parameter p_a, representing the fraction of pre-antiviral state cells. The authors wrote: ``The parameter p_a can also be understood as the probability that an O cell will switch to the N or A state when exposed to the virus of IFNs, respectively.' Nevertheless, biologically, the fraction of pre-antiviral state cells does not mean the same value as the probability that an O cell switches to the N or A state. Moreover, in the numerical scheme, the cell state changes according to the deterministic role N(O)=a and N(a)=A. Hence, the probability p_a did not apply to the model simulation. It may need to clarify the exact meaning of the parameter p_a.

      Author response: We acknowledge that this was an imprecise formulation, and have now changed it.

      What we tried to convey with that comment was that, alternatively to having a certain fraction of cells be in the a state initially, one could instead have devised a model in which We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%. each O-state cell simply had a probability to act as an a-state cell upon exposure to the virus or to interferons, i.e. to switch to an N state (if exposed to virus) or to the A state (if exposed to interferons). In this simplified model, there would be no functional difference, since it would simply amount to whether each cell had a probability to be designated an a-cell initially (as in our model), or upon exposure. So our remark mainly served to explain that the role of the p_a parameter is simply to encode that a certain fraction of virus-naive cells behave this way (whether predetermined or not).

      (2) The current model is deterministic. However, biologically, considering the probabilistic model may be more realistic. Are the results valid when the probability update strategy is considered? By the probability model, the cells change their state randomly to the state of the neighbor cells. The probability of cell state changes may be relevant for the threshold of p_a. It is interesting to know how the random response of cells may affect the main results and the critical value of p_a.

      Author response: This is a good point - we are firm believers in the importance of stochasticity. We should note that even the current model has a level of stochasticity, since we choose the cells to be updated with a constant probability rate - we choose N cells to update in each timestep, with replacement.

      However, based on your suggestion, we simulated a version of the dynamics which included stochastic conversion, i.e. each action of a cell on a nearby cell happens only with a probability p_conv (and the original model is recovered as the p_conv=1 scenario). Of course, this slows down the dynamics (or effectively rescales time by a factor p_conv), but crucially we find that it does not appreciably affect the location of the threshold p_c. Below we include a parameter scan across p_a values for R=1 and p_conv=0.5, which shows that the threshold continues to appear at around p_a=27%.

      We now discuss these findings in the supplement and include the figure below as Fig. S5.

      Author response image 1.

      (3) Figure 2 shows a critical value p_c = 27.8% following a simulation on a lattice with dimension L = 1000. However, it is unclear if dimension changes may affect the critical value.

      Author response: Re-running the simulations on a lattice 4x as large (i.e. L=2000) yields a similar critical value of 27-28% for R=1, so we are confident that finite size effects do not play a major role at L=1000 and beyond. For R=5, however, we find that a minimum lattice size greater than L=1000 is necessary to determine the critical threshold. Concretely, we find that the threshold value pc for R=5 changes somewhat when the lattice size is increased from 1000 to 2000, but is invariant under a change from 2000 to 3000, so we conclude that L=2000 is sufficient for R=5. The pc value for R=5 cited in the manuscript (~0.4%) was determined from simulations at L=2000.

      Reviewer #3 (Public Review):

      Summary:

      This study considers how to model distinct host cell states that correspond to different stages of a viral infection: from naïve and susceptible cells to infected cells and a minority of important interferon-secreting cells that are the first line of defense against viral spread. The study first considers the distinct host cell states by analyzing previously published single-cell RNAseq data. Then an agent-based model on a square lattice is used to probe the dependence of the system on various parameters. Finally, a simplified version of the model is explored, and shown to have some similarity with the more complex model, yet lacks the dependence on the interferon range. By exploring these models one gains an intuitive understanding of the system, and the model may be used to generate hypotheses that could be tested experimentally, telling us "when to be surprised" if the biological system deviates from the model predictions.

      Author response: Thank you for the summary! We agree with the role that you describe for a model such as this one.

      Strengths:

      -  Clear presentation of the experimental findings and a clear logical progression from these experimental findings to the modeling.

      -  The modeling results are easy to understand, revealing interesting behavior and percolation-like features.

      -  The scaling results presented span several decades and are therefore compelling. - The results presented suggest several interesting directions for theoretical follow-up work, as well as possible experiments to probe the system (e.g. by stimulating or blocking IFN secretion).

      Weaknesses:

      -  Since the "range" of IFN is an important parameter, it makes sense to consider lattice geometries other than the square lattice, which is somewhat pathological. Perhaps a hexagonal lattice would generalize better.

      -  Tissues are typically three-dimensional, not two-dimensional. (Epithelium is an exception). It would be interesting to see how the modeling translates to the three-dimensional case. Percolation transitions are known to be very sensitive to the dimensionality of the system.

      Author response: We agree that probing different lattice geometries (2- and 3-dimensional alike) would be interesting and worthwhile. However, for this manuscript, we prefer to confine the analysis to the current, simple case. We do agree, however, that an extensive exploration of the role of geometry is an interesting future possibility.

      -  The fixed time-step of the agent-based modeling may introduce biases. I would consider simulating the system with Gillespie dynamics where the reaction rates depend on the ambient system parameters.

      -  Single-cell RNAseq data typically involves data imputation due to the high sparsity of the measured gene expression. More information could be provided on this crucial data processing step since it may significantly alter the experimental findings.

      Justification of claims and conclusions:

      The claims and conclusions are well justified.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      It is necessary to explain what UMAP does. Is clustering done in the space of twenty-something original dimensions or 2D? How UMAP1 and UMAP2 are selected and are those the same in all plots?

      Author response: We have now added a few sentences to clarify the point raised above - the second snippet explains how clustering is performed:

      “As a dimension reduction algorithm, UMAP is a manifold learning technique that favors the preservation of local distances over global distances (McInnes et al., 2018; Becht et al., 2019). It constructs a weighted graph from the data points and optimizes the graph layout in the low-dimensional space.”

      “We cluster the cells with the principal components analysis (PCA) results from their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells then optimize the modularity function to determine clusters. We present the cluster information on the UMAP plane and use the same UMAP coordinates for all the plots in this paper hereafter.”

      Figure 1, what do bars in the upper right corners of panels d,e,f, and g indicate? ``Averaged' refers to time average? Something is missing in ``Cell proportions are labeled with corresponding colors in a)' .

      Author response: Thank you - we have now modified the figure caption. The bars in the upper right corners of panels d, e, f are color keys for gene expression, the brighter the color is, the higher the gene expression is.

      “Averaged” gene expression refers to the mean expression of that particular gene across the cells within each indicated cluster.

      The lines in c) correspond to cell proportions in different states at different time points. The same state in 1) and c) is shown in the same color.

      Line 46, ``However' does not sound right in this context. Would ``Also' be better?

      Author response: We agree and have corrected it in the revised manuscript.

      Line 96``The viral genes are also partially expressed in these cells, but different from the 𝑁 cluster, the antiviral genes are fully expressed (Fig. S1 and S2).' The sentence needs to be rephrased.

      Author response: We have rephrased the sentence: “As in the N cluster, the viral gene E is barely detected in these cells, indicating incomplete viral replication. However, in contrast to the N cluster, the antiviral genes are expressed to their full extent (Fig. S1 and S2).”

      Line 126, missing "be", ``large' -> ``larger'.

      Author response: Thank you, we have now corrected these typos.

      Line 139-140 The logical link between ignoring apoptosis and the diffusion of IFN is unclear.

      Author response: We modified the sentence as “Here, we assume that the secretion of IFNs by the 𝑁 cells is a faster process than possible apoptosis (Wen et al., 1997; Tesfaigzi, 2006) of these cells and that the diffusion of IFNs to the neighborhood is not significantly affected by apoptosis.”

      Fig. 2a Do the yellow arrows show the effect of IFN and the purple arrows the propagation of viral infection?

      Author response: That is correct. We have added this information to the figure caption: “The straight black arrows indicate transitions between cell states. The curved yellow arrows indicate the effects of IFNs on activating antiviral states. The curved purple arrows indicate viral spread to cells with 𝑂 and 𝑎 states.”

      Fig. 3, n(s) as the axis label vs P(s) in the text? How do the curves in panel a) look when the p_a is well above or below p_c?

      Author response: Thank you. We have edited the labels in the figure to reflect the symbols used in the text.

      Boundary conditions? From Fig. 4, apparently periodic?

      Author response: Yes, we use periodic boundary conditions in the model. We clarify it in the model section now (last sentence).

      It will be good to see a plot with time dependences of all cell types for a couple of values of p_a, illustrating propagation and cessation of the infection.

      Author response: We agree, and have added a Figure S4 in the supplement which explores exactly that. Thank you for the suggestion.

      A verbal qualitative description of why p_a has such importance and how the infection is terminated for large p_a would help.

      Reviewer #2 (Recommendations For The Authors):

      Below are two minor comments:

      (1) In the single-cell RNA sequencing data analysis, the authors describe the cell clusters O, V, A, and N. However, showing how the clusters are identified from the data might be more straightforward.

      Author response: Technically, we cluster the cells using principal components analysis (PCA) results of their gene expression. With the first 16 principal components, we calculate k-nearest neighbors and construct the shared nearest neighbor graph of the cells and then optimize the modularity function to determine clusters. We manually annotate the clusters with O, V, A, and N based on the detected abundance of viral genes, antiviral genes, and IFNs.

      (2) In Figure 3, what does n(s) mean in Figure 3a? And what is the meaning of the distribution P(s) of infection clusters? It may be stated clearly.

      Author response: The use of n(s) was inconsistent, and we have now edited the figure to instead say P(s), to harmonize it with the text. P(s) is the distribution of cluster sizes, s, expressed as a fraction of the whole system. In other words, once a cluster has reached its final size, we record s=(N+V)/L^2 where N and V are the number of N and V state cells in the cluster (note that, by design, each simulation leads to a single cluster, since we seed the infection in one lattice point). We now indicate more clearly in the caption and the main text what exactly P(s) and s refer to.

      Reviewer #3 (Recommendations For The Authors):

      - Would the authors kindly share the simulation code with the community? Also, the data analysis code should be shared to follow current best practices. This needs to be standard practice in all publications. I would go as far as to say that in 2024 publishing a data analysis / simulation study without sharing the relevant code should be ostracized by the community.

      Author response: We absolutely agree and have created a GitHub repository in which we share the C++ source code for the simulations and a Python notebook for plotting. The public repository can be found at https://github.com/BjarkeFN/ViralPercolation. We add this information in supplement under section “Code availability”.

      ­

      - I would avoid the use of the wording "critical" threshold since this is almost guaranteed to infuriate a certain type of reader.

      ­

      - Line 265 has a curious use of " ... " which should be replaced with something more appropriate.

      Author response: Thank you for pointing it out! We have checked the typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set out to develop genetic tools that can specifically and comprehensively label Axo-Axonic Cells (AACs), also known as Chandelier cells. These AACs possess unique morphological and connectivity features, making them an ideal subject for studying various aspects of cell types across different experimental methods. To achieve both specificity and comprehensiveness in AAC labeling, the authors employ an intersectional strategy that combines lineage origin and molecular markers. This approach successfully targets AACs across the mouse brain and reveals their widespread distribution in various brain structures beyond the previously known regions. Additionally, the authors utilize rabies transneuronal labeling to provide a comprehensive overview of AACs, their variations, and input sources throughout the brain. This experimental approach offers a powerful model system for investigating the role of AACs in circuit development and function across diverse brain regions.

      Strengths:

      Genetic Tools and Specificity: The authors' genetic tools show qualitative evidence of specificity for AACs, opening new avenues for targeted research on these cells. The use of intersectional strategies enhances the precision of AAC labeling.

      Widespread Distribution: The study significantly broadens our understanding of AAC distribution, revealing their presence in brain regions beyond what was previously documented. This expanded knowledge is a valuable contribution to the field.

      Transneuronal Labeling: The inclusion of rabies transneuronal labeling provides a comprehensive view of AACs, their variations, and input sources, allowing for a more holistic understanding of their role in neural circuits.

      Weaknesses:

      Quantitative Analysis: While the claim of specificity appears qualitatively convincing, the manuscript could be improved with more quantitative analysis.

      We are glad that the reviewers appreciated our multimodal and brain-wide characterizations of the AAC population. We include many qualitative AAC examples and would like to highlight the quantitative nature of our whole brain cell body and cartridge analyses, made possible by transgenic targeting and our serial two-photon tomography imaging platform (STP). In addition to providing this brain wide AAC atlas, we also propose AACs as perhaps one of the best case examples for a bona fide cell type, which may inspire further in-depth anatomical and functional studies of AACs, and efforts to capture other ground truth cell types.

      Comprehensiveness Claim: The assertion of comprehensiveness, implying labeling "almost all" AACs in all brain regions, is challenging to substantiate conclusively. Acknowledging the limitations of proving complete comprehensiveness and discussing them in the discussion section would be more appropriate than asserting it in the results section.

      We thank the reviewer for this suggestion and have revised the results and discussion sections accordingly. The issue of how to access comprehensiveness in AAC labeling is a fair and important point, as dense brain-wide AAC labeling has not been achieved and assessed before. Previous studies had used less efficient and specific methods for capturing AACs, primarily in select areas of cortex, hippocampus, and amygdala. These AAC populations are recapitulated by our genetic strategies with higher density and specificity. It does not seem that we have missed any previously-reported AAC populations; in fact, we discovered multiple previously unreported populations. Another evidence supporting our “comprehensive” labeling of AACs is that two independent Unc5b and Pthlh transgenic strategies showed very similar AAC distribution patterns (Fig. 1 Suppl. 3). However, we recognize that probably the only way to fully assess “completeness” of labeling may be to compare with anatomical ground truth, such as by dense EM reconstruction of all AACs across the brain volume. This is currently not technically possible but may become feasible in the future. 

      Local Inputs: While the manuscript focuses on inter-areal inputs to AACs, it would benefit from exploring local inputs as well. Identifying the local neurons that target AACs and analyzing their patterns could provide valuable insights into AAC function within specific brain regions.

      This is a good suggestion. However, our serial two-photon tomography imaging platform does not have the capability for reliably preserving tissue sections for immunohistochemical processing afterward. Additionally, though our starter AAV injections were limited to 100-150nL, there were far too many input cells labelled at the injection side to resolve individual input cells and correlate with their synaptic partners (e.g. a rabies-labelled pyramidal cell within the injection site may still project to starter cell few hundred microns away). Thus, our rabies input mapping was best suited for characterizing long-range inputs and was the focus here. For studying local inputs to AACs, future studies could combine very dilute starter AAV injections with multi-marker characterization of cell types by immunohistochemistry or FISH.  

      Discussion Focus: The discussion section should delve deeper into the biological implications of the findings, moving beyond technical significance. Exploring similarities and differences in input patterns between AACs and other cell types, and linking them to the locations of starter cells or specific connectivity patterns in the brain, would enrich the discussion. For instance, investigating whether input patterns can be predicted based on the locations of starter cells or connectivity specificity could provide valuable insights.

      We thank the reviewer for this suggestion. We have expanded the discussion to include more on the relevance and implications of our input mapping results to different starter populations of AACs.

      Reviewer #2 (Public Review):

      Summary:

      The goals of this study were to develop a genetic approach that would specifically and comprehensively target axo-axonic cells (AACs) throughout the brain and then to describe the patterns and characteristics of the targeted AACs in multiple, selected brain regions. The investigators have been successful in providing the most complete description of the regional distribution of putative (pAACs) throughout the brain to date. The supporting evidence is convincing, even though incomplete in some brain regions. The findings should serve as a guide for more detailed studies of AACs within each brain region and lead to new insights into the connectivity and functional organization of this important group of GABAergic interneurons.

      Strengths:

      The study has numerous strengths. A major strength is the development of a unique intersectional genetic strategy that uses cell lineage (Nkx2.1) and molecular (Unc5b or Pthlh) markers to identify axo-axonic AACs specifically and, apparently, nearly completely throughout the mouse brain. While AACs have been described previously in the cerebral cortex, hippocampus, and amygdala, there has been no specific genetic marker that selectively identifies all AACs in these regions.

      The current genetic strategy has labeled pAACs in a large number of additional brain regions, including the claustrum-insular complex, extended amygdala, and several olfactory centers. In general, the findings provide support for the specificity of the methods for targeting AACs, and include some examples of labeling near markers of axon initial segments. However, the Investigators are careful to refer to labeled neurons as "putative AACs" as they have not been fully characterized and their identity verified.

      The descriptions and numerous low-magnification images of the brain provide a roadmap for subsequent, detailed studies of AACs in numerous brain regions. The overview and summaries of the findings in the Abstract, Introduction, and Discussion are particularly clear and helpful in placing the extensive regional descriptions of AACs in context.

      Weaknesses:

      One weakness of the study is the lack of an illustration of the high-resolution cell labeling that can be achieved with the methods, including labeling of numerous rows of axon terminals in contact with axon initial segments. The initial images of the brain-wide distribution of putative AACs are necessarily presented at low magnification. Although the authors indicate that the cells have "highly characteristic AAC labeling patterns throughout the neocortex, hippocampus and BLA", these morphological details cannot be visualized by the reader at the current magnification, even when the images are enlarged on the computer screen. Some of the details become evident in later Figures, but an initial illustration of single cell labeling with confocal microscopy, or tracing of their characteristic axonal arbors, would support the specificity of the labeling in the low magnification images.

      We thank the reviewer for the suggestion. We have now added high-resolution images showing the colocalization of AAC axon boutons (cartridges) along AnkG positive postsynaptic axon initial segments in Fig. 2 Suppl. 1, Figure 1 panels a, d, e, and Fig. 4 panels b, c. These images unequivocally demonstrate AAC identity and specificity.

      Table 1 indicates that the AAC identity of the cells has been validated in many brain regions but not in all. The methods used for validation have not been described and should be included for completeness. The authors are careful to acknowledge that labeled cells in some regions have not been validated and refer to such cells as pAACs.

      Validation was defined by colocalization of RFP-labelled AAC cartridges and AnkryinG or Phospho-IκBα-labelled axon initial segments, imaged by confocal microscopy. We provide high-magnification examples throughout figures 2-6 and supplements. We have also tried to clarify this better in the methods section entitled “Immunohistochemistry.” Putative AAC (pAACs) refers to populations in which relatively few single cell examples of AACs exhibiting co-localized cartridges were found, largely due to the sparsity of the low tamoxifen dosage used (see response above).

      The intersectional genetic methods included the use of the lineage marker Nkx2.1 with either Unc5b or Pthlh as the molecular marker. As described, the mice with intersectional targeting of Nkx2.1 and Unc5b appear to show the most specific brain-wide labeling for AACs, and the majority of the descriptions are from these mice. The targeting with Nkx2.1 and Pthlh is less convincing. The title for Figure 1 Supplemental Figure 3 suggests a similar AAC distribution in the Pthlh;Nkx2.1 mouse compared to the Unc5b;Nkx2.1 mouse. However, the descriptions of the individual panels suggest a number of inconsistencies and non-AAC labeling. The heavy labeling in the caudate and cells in layer 4 is particularly problematic. Based on the data presented, it appears that heavy labeling achieved in these mice could not be relied on for specific labeling of all AACs, although specific labeling could be achieved under some conditions, such as following tamoxifen administration at select ages.

      The reviewer is correct about Pthlh being less specific for AACs than Unc5b when crossed to a constitutive Nkx2.1 recombinase driver line. Pthlh/Nkx2.1 intersection labeled a set of layer 4 cells in somatosensory cortex and dense cells in striatum, which are clearly not AACs. But these are the only main difference compared to Unc5b/Nkx2.1 intersection. As the reviewer points out, it is only when Pthlh is crossed to an inducible Nkx2.1-CreER line and induced embryonically with tamoxifen that there is more specific AAC labeling (at least in cortex). We included this data as well as the intersection with VIP-Cre in case either of these are useful to researchers studying fate-mapping of AACs or bipolar cell interneurons. We have also revised the title of Fig. 1 Suppl. 3 to better convey this.

      The methods described for dense labeling and single-cell labeling are described briefly in the methods. Some discussion of the development of the methods would be useful, including how it was determined that methods for heavy labeling identified AACs specifically and completely.

      We have added a description on the development of these to the methods section entitled “Animals.”

      Reviewer #3 (Public Review):

      Summary:

      Raudales et al. aimed at providing an insight into the brain-wide distribution and synaptic connectivity of bona fide GABAergic inhibitory interneuron subtypes focusing on the axo-axonic cell (AAC), one of the most distinctive interneuron subtypes, which innervates the axon initial segments of glutamatergic projection neurons. They establish intersectional genetic strategies that enable them to specifically and comprehensively capture AACs based on their lineage (Nkx2.1) and marker expression (Unc5b, Pthlh). They find that AACs are deployed across essentially all the pallium-derived brain structures as well as the anterior olfactory nucleus, taenia tecta, and lateral septum. They show that AACs in distinct areas and layers of the neocortex as well as different subregions of the hippocampal formation display unique soma and synaptic density and morphological variations. Rabies virus-based retrograde monosynaptic input tracing reveals that AACs in the neocortex, the hippocampus, and the basolateral amygdala receive synaptic inputs from common as well as specific brain regions and supports the utility of this novel genetic approach. This study elucidates brain-wide neuroanatomical features and morphological variations of AACs with solid techniques and analysis. Their novel AAC-targeting strategies will facilitate the study of their development and function in different brain regions. The conclusions in this paper are well supported by the data. However, there are a few comments to strengthen this study.

      (1) The definition of putative AAC (pAAC) is unclear and Table 1 may not be accurate. Although the authors find synaptic cartridges of RFP-labeled cells in the claustro-insular complex and the dorsal endopiriform nuclei, they still consider these cells as pAACs (not validated). The authors claim that without examining the presence of synaptic cartridges, RFP-labeled cells in the hypothalamus and the bed nuclei of the stria terminalis (BNST) are pAACs while those in the L4 of the somatosensory cortex in Pthlh;Nkx2.1;Ai65 mice are non-AACs. In Table 1, the BNST is supposed to contain AACs (validated), but in the text, the authors claim that RFP-labeled cells in the BNST are pAACs. Could the authors clarify how AACs, pAACs, and non-AACs are defined?

      We thank the reviewer for their interest and comments on our work. Please see our response to reviewer 2 for clarification on putative pAACs. Additionally, we have clarified in the methods under “Immunohistochemistry” how we defined AACs, pAAC, and non-AACs. For BNST we did not positively identify more than a few exhibiting overlap with AnkryinG/IκBα, so we currently leave them as pAACs—Table 1 has been corrected to reflect this.

      (2) The intersectional strategies presented in this study could also specifically capture developing AACs. If so, how early are AACs labeled in the brain? It would also be nice if the authors could add a simple schematic like Fig. 1a showing the time course of Pthlh expression.

      We thank the reviewer for suggesting the application of our method in studying AAC development. As the onset of Unc5b is in early postnatal time, tamoxifen induction of Unc5b-CreER in early postnatal days can enable studies of AAC neurite and synapse development, maturation, and plasticity. Similarly, Pthlh expression in the brain is relatively low/absent at P4 and present at P14 and later timepoints. Pthlh-Flp;Nkx2.1-Cre intersection can be used to study postnatal AAC development and plasticity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the claim of specificity appears qualitatively convincing, additional quantitative analysis would make the authors' claim much stronger. For example in Figure 4 (f-h), where the authors show an overlap of AAC axons with AnkG labeling, there also appears to be a region of AAC axon lacking adjacent AnkG labeling. The author could quantify the fraction of cartridges that overlap with AnkG labeling in different brain regions, potentially stringing their claim that pAACs are AACs as well as providing important documentation of the diversity or homogeneity of compartment targeting across the brain.

      As mentioned previously, we only performed AnkG co-labeling analysis on low-dose tamoxifen/sparsely labelled samples in which we could readily differentiate individual cells. This was performed on samples with the Ai65 cytoplasmic reporter—for validation purposes we could positively identify co-labelled cartridges, but it would be more difficult to accurately identify any cartridges not co-labeled (since the entire axon was labelled with RFP). For precisely identifying and mapping AAC cartridge locations we found the intersectional synaptophysin-EGFP reporter (Fig. 2k-n) to be a more precise method for specifically labeling the “cartridge” segment of AAC axons. However, we did not try AnkG staining on samples from this reporter line, as they were set aside for STP imaging.

      Regarding the claim of comprehensiveness, labeling "almost all" AACs in all brain regions is a high standard and challenging to demonstrate conclusively. The study already significantly expands our understanding of AAC distribution, and the authors might consider discussing the limitations of proving complete comprehensiveness in the discussion rather than claiming it in the results section.

      We again thank the reviewer for this critique. As mentioned above, we have revised the results and discussion sections to better convey this point across.

      Furthermore, the manuscript connectivity section primarily focuses on inter-areal inputs to AACs, but it could benefit from exploring local inputs as well. By identifying the local neurons that target AACs, the authors could ask if there is any general property or rule of the local projections to AACs across the brain, or at least within the cortex. Moreover, a clear indication of the injection site would be helpful, particularly in Figure 7, where there seems to be some discrepancy between the histograms and fluorescent images regarding local projections. The histograms of Figure 7, seem to indicate that the local projection to AACs is a small fraction of all the presynaptic neurons, however, the fluorescent image for the SSp seems to suggest otherwise with many fluorescent cells in the injected area.

      We thank the reviewer for these comments. Regarding the local inputs in the rabies tracing datasets, it is a limitation (as mentioned above) of our STP platform’s inability to preserve tissue for immunohistochemistry labeling as well as our relatively dense starter cell labeling. Instead, our focus here was on long-range inputs (i.e. outside the ipsilateral ARA area of injection), which was simply not known for these AAC populations. We have revised the Figure 7 legend and added a description in the methods section to more clearly indicate that we only included long-range input projections in the Figure 7 histograms.

      In the discussion, the authors should delve more into the biological implications of their findings rather than solely emphasizing the technical significance. They could explore the similarities and differences in input patterns between AACs and other cell types, potentially linking them to the locations of their starter cells or specific connectivity patterns in the brain. For example, the authors could check if the input patterns could be predicted from the projections to the layers where their starter cells are located (either from an Atlas like the Allen Connectivity Atlas, or from retrograde rabies injections in the same locations). Can the differences between the input patterns to PVC and AAC be predicted for their location versus some specificity of connections?

      Thank you for the extensive comment. We address this point above, and have revised our discussion accordingly.

      Reviewer #2 (Recommendations For The Authors):

      The Figure legends vary in completeness and quality.

      (1) The legend for Figure 1 is very informative, and section e-g serves as a useful guide, as the legend includes the names of the brain regions related to the abbreviations and also indicates the specific panels that show the identified structures. Because of the large number of structures and the number of panels in each Figure, it would be ideal to follow the same pattern in the remaining figures.

      (2) Several edits are needed in the legend for Figure 1 Supplement Figure 1. The descriptions of a-f could be improved by providing general terms to describe the brain regions associated with the latter list of abbreviations (as has been done with the identification of the cerebral cortex, hippocampus, and olfactory centers and their related panels). One suggestion would be to write out insula, claustrum, and endopiriform prior to listing the abbreviations (AI, CLA, EP) (b-c) and adding amygdaloid complex and extended amygdala before the abbreviations (COA, BLA, MeA) (d-f) and (BST) (d).

      We thank the reviewer, as the suggestion of further expanding the abbreviations is a good one. As such, we have revised/reorganized the anatomical abbreviations in the figure legends for Figure 1 Supplement Figures 1, 2, and 3.

      Descriptions for Panels g-j require editing to link the appropriate panels and the descriptions. Panels for BSTpr appear to be g-h (rather than f-g) and i,j (rather than h-i.

      We have fixed this typo in the legend for Figure 1 Supplement Figure 1.

      Descriptions for Panels k-n could be edited to include abbreviations for the identified brain regions. For example, include the abbreviation ARHP after arcuate nuclei and indicate panels m-n (rather than j-l); include PVP after paraventricular and indicate panel n (rather than m); include DMPH after dorsomedial nuclei and indicate k-m (rather than j-l).

      Thank you for the suggestion. We have expanded the abbreviations in Figure 1 Supplement 1 accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please clarify if tdTomato, EGFP (from helper AAVs), and RFP (from rabies virus) are native signals or IHC signals in legends.

      We have added the descriptors “native” or “stained” to all figure legends containing fluorescent images.

      (2) Fig. 4b and c: Please add insets of high-magnification images showing AAC boutons along AnkG-labeled AISs.

      We have added these insets to Fig. 4b and c.

      (3) Fig. 7S1: It appears that d and e are reversed. Judging from the positions of starter cells, d is for PV-Cre? Please make sure. It is also better to draw the laminar border in d and e.

      The original genotype labels are correct for Fig. 7S1 d and e. We have added the laminar borders as suggested.

      (4) Fig. 9b: Just for consistency, please label with the name of the helper AAV.

      Added.

      (5) Line 617: intragranular>>>infragranular?

      Corrected, thank you.

      (6) It may be unclear to some readers if the images in the figures are from confocal or STP. The authors may want to clarify that all images in the figures are generated by confocal microscopy in the method section.

      We have clarified this better in the methods section, “Microcopy and image analysis.”

      (7) The authors should clarify that STP was used to map input cells to the brain in the result section.

      We have added this description in the results section.

    1. Author response:

      We thank the reviewers and editors for their review and assessment of our manuscript and comprehensive feedback. The manuscript will be revised to address all the reviewers’ comments. Specifically, to address the comment of Reviewer 1 and the editor regarding the lack of quantitative comparison between the classical and fractal cycle approaches and identification of the source of the discrepancies between classical and fractal cycles, we plan to perform and report the following analyses and comparisons:

      (1) Intra-method reliability

      a) Classical cycles. An additional scorer will independently define onsets and offsets of all classical sleep cycles for all datasets and mark sleep cycles with skipped REM sleep. Likewise, we will perform automatic sleep cycle detection. We will add a new Supplementary table showing the averaged cycle durations obtained by the two scorers and automatic algorithm as well as the inter-scorer rate agreement and update the Supplemental Excel file with corresponding information for each cycle for each participant for each dataset.

      b) Fractal cycles. We will correlate the durations of fractal cycles calculated using the parameters defined in the Main text with those calculated using different parameters, namely, the longer and shorter smoothing window lengths, higher and lower minimum peak prominence. Likewise, we will correlate the durations of fractal cycles calculated using frontal vs other available electrodes.

      (2) Origin of method differences

      In the current version of our Manuscript, we describe a few possible sources of discrepancies between classical and fractal cycle durations and numbers. Following the suggestion of one of the reviewers, in the revised Manuscript, we will quantify the sources of discrepancies between the two methods in order to identify the “criteria for recordings in which fractal cycles will produce similar results to the classical method”. Specifically, we will calculate the correlation between the difference in classical vs fractal sleep cycle durations on one side, and either the amplitudes of fractal descend/ascend, relative durations of cycles with skipped REM sleep and wake after sleep onset, or peak flatness on the other side.    

      In addition, we will include a new figure, illustrating the goodness of fit of the data as assessed by the IRASA method. Likewise, we will update Supplementary File 1 (that shows classical and fractal sleep cycles for each participant) with marks that highlight the onsets and offsets of sleep cycles as well as the cycles with skipped REM sleep.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      In this important study, the authors report a novel measurement of the Escherichia coli chemotactic response and demonstrate that these bacteria display an attractant response to potassium, which is connected to intracellular pH level. Whilst the experiments are mostly convincing, there are some confounders regards pH changes and fluorescent proteins that remain to be addressed.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper shows that E. coli exhibits a chemotactic response to potassium by measuring both the motor response (using a bead assay) and the intracellular signaling response (CheY phosporylation level via FRET) to step changes in potassium concentration. They find increase in potassium concentration induces a considerable attractant response, with amplitude comparable to aspartate, and cells can quickly adapt (and generally over-adapt). The authors propose that the mechanism for potassium response is through modifying intracellular pH; they find both that potassium modifies pH and other pH modifiers induce similar attractant responses. It is also shown, using Tar- and Tsr-only mutants, that these two chemoreceptors respond to potassium differently. Tsr has a standard attractant response, while Tar has a biphasic response (repellent-like then attractant-like). Finally, the authors use computer simulations to study the swimming response of cells to a periodic potassium signal secreted from a biofilm and find a phase delay that depends on the period of oscillation.

      Strengths:

      The finding that E. coli can sense and adapt to potassium signals and the connection to intracellular pH is quite interesting and this work should stimulate future experimental and theoretical studies regarding the microscopic mechanisms governing this response. The evidence (from both the bead assay and FRET) that potassium induces an attractant response is convincing, as is the proposed mechanism involving modification of intracellular pH. The updated manuscript controls for the impact of pH on the fluorescent protein brightness that can bias the measured FRET signal. After correction the response amplitude and sharpness (hill coefficient) are comparable to conventional chemoattractants (e.g. aspartate), indicating the general mechanisms underlying the response may be similar. The authors suggest that the biphasic response of Tar mutants may be due to pH influencing the activity of other enzymes (CheA, CheR or CheB), which will be an interesting direction for future study.

      Weaknesses:

      The measured response may be biased by adaptation, especially for weak potassium signals. For other attractant stimuli, the response typically shows a low plateau before it recovers (adapts). In the case of potassium, the FRET signal does not have an obvious plateau following the stimuli of small potassium concentrations, perhaps due to the faster adaptation compared to other chemoattractants. It is possible cells have already partially adapted when the response reaches its minimum, so the measured response may be a slight underestimate of the true response. Mutants without adaptation enzymes appear to be sensitive to potassium only at much larger concentrations, where the pH significantly disrupts the FRET signal; more accurate measurements would require development of new mutants and/or measurement techniques.

      We acknowledge and appreciate the reviewer's concerns regarding the potential impact of adaptation on the measured response magnitude. We have estimated the effect of adaptation on the measured response magnitude. The half-time of adaptation at 30 mM KCl was measured to be approximately 80 s, corresponding to a time constant of t = 80/ln(2) = 115.4 s, which is significantly longer than the time required for medium exchange in the flow chamber (less than 10 s). Consequently, the relative effect of adaptation on the measured response magnitude should be less than 1-exp(-10/t) = 8.3%. Even for the fastest adaptation (at the lowest KCl concentration) we measured, the effect should be less than 20%, which is within experimental uncertainties. Nevertheless, we agree that developing new techniques to measure the dose-response curve more precisely would be beneficial.

      Reviewer #2 (Public Review):

      Zhang et al investigated the biophysical mechanism of potassium-mediated chemotactic behavior in E coli. Previously, it was reported by Humphries et al that the potassium waves from oscillating B subtilis biofilm attract P aeruginosa through chemotactic behavior of motile P aeruginosa cells. It was proposed that K+ waves alter PMF of P aeruginosa. However, the mechanism was this behaviour was not elusive. In this study, Zhang et al demonstrated that motile E coli cells accumulate in regions of high potassium levels. They found that this behavior is likely resulting from the chemotaxis signalling pathway, mediated by an elevation of intracellular pH. Overall, a solid body of evidence is provided to support the claims. However, the impacts of pH on the fluorescence proteins need to be better evaluated. In its current form, the evidence is insufficient to say that the fluoresce intensity ratio results from FRET. It may well be an artefact of pH change.

      The authors now carefully evaluated the impact of pH on their FRET sensor by examining the YFP and CFP fluorescence with no-receptor mutant. The authors used this data to correct the impact of pH on their FRET sensor. This is an improvement, but the mathematical operation of this correction needs clarification. This is particularly important because, looking at the data, it is not fully convincing if the correction was done properly. For instance, 3mM KCl gives 0.98 FRET signal both in Fig3 and FigS4, but there is almost no difference between blue and red lines in Fig 3. FigS4 is very informative, but it does not address the concern raised by both reviewers that FRET reporter may not be a reliable tool here due to pH change.

      We apologize for not making the correction process clear. We corrected the impact of pH on the original signals for both CFP and YFP channels by

      where and represent the pH-corrected and original PMT signal (CFP or YFP channel) from the moment of addition of L mM KCl to the moment of its removal, respectively, and  is the correction factor, which is the ratio of PMT signal post- to pre-KCl addition for the no-receptor mutant at L mM KCl, for CFP or YFP channel as shown Fig. S5. The pH-corrected FRET response is then calculated as the ratio of the pH-corrected YFP to the pH-corrected CFP signals, normalized by the pre-stimulus ratio.

      As shown in Author response image1, which represents the same data as Fig. 3A and Fig. S5A, the original normalized FRET responses to 3 mM KCl are 0.967 for the wild-type strain (Fig. 3) and 0.981 for the no-receptor strain (Fig. S5). The standard deviation of the FRET values under steady-state conditions is 0.003. Thus, the difference in responses between the wild-type and no-receptor strains is significant and clearly exceeds the standard deviation. The pH correction factors CpH at 3 mM KCl are 1.004 for the YFP signal and 1.016 for the CFP signal. Consequently, the pH-corrected FRET responses are 0.967´1.016/1.004=0.979 for the wild-type and 0.981´1.016/1.004=0.993 for the no-receptor strain. The reason the pH-corrected FRET response for the no-receptor strain is 0.993 instead of the expected 1.000 is that this value represents the lowest observed response rather than the average value for the FRET response.

      The detailed mathematical operation for correcting the pH impact has now been included in the “FRET assay” section of Materials and Methods.

      Author response image 1.

      Chemotactic response of the wild-type strain (A, HCB1288-pVS88) and the no-receptor strain (B, HCB1414-pVS88) to stepwise addition and removal of KCl. The blue solid line denotes the original normalized signal. Downward and upward arrows indicate the time points of addition and removal of 3 mM KCl, respectively. The horizontal red dashed line denotes the original normalized FRET response value to 3 mM KCl.

      The authors show the FRET data with both KCl and K2SO4, concluding that the chemotactic response mainly resulted from potassium ions. However, this was only measured by FRET. It would be more convincing if the motility assay in Fig1 is also performed with K2SO4. The authors did not address this point. In light of complications associated with the use of the FRET sensor, this experiment is more important.

      We thank the reviewer for the suggestion. We agree that additional confirmation with a motility assay is important. To address this, we have now measured the response of the motor rotational signal to 15 mM K2SO4 using the bead assay and compared it with the response to 30 mM KCl. The results are shown in Fig. S2. The response of motor CW bias to 15 mM K2SO4 exhibited an attractant response, characterized by a decreased CW bias upon the addition of K2SO4, followed by an over-adaptation that is qualitatively similar to the response to 30 mM KCl. However, there were notable differences in the adaptation time and the presence of an overshoot. Specifically, the adaptation time to K2SO4 was shorter compared to that for KCl, and there was a notable overshoot in the CW bias during the adaptation phase. These differences may have resulted from the weaker response to K2SO4 (Fig. S1B) and additional modifications due to CysZ-mediated cellular uptake of sulfate (Zhang et al., Biochimica et Biophysica Acta 1838,1809–1816 (2014)). The faster adaptation and overshoot complicated the chemotactic drift in the microfluidic assay as in Fig. 1, such that we were unable to observe a noticeable drift in a K2SO4 gradient under the same experimental conditions used for the KCl gradient.

      The response of motor rotational signal to 15 mM K2SO4 has been added to Fig. S2.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The response curve and adaptation level/time in the main text (Fig. 4) should be replaced by the corrected counterparts (currently in Fig. S5). The current version is especially confusing because Fig. 6 shows the corrected response, but the difference from Fig. 4 is not mentioned.

      We thank the reviewer for the suggestion. We have now merged the results of the original Fig. S5 into Fig. 4.

      a. The discussion of the uncorrected response with small hill coefficient and potentially negative cooperativity was left in the text (lines 223-234), but the new measurements show this is not true for the actual response. This should be removed or significantly rephrased.

      We thank the reviewer for the suggestion. We have now removed the statement about potentially negative cooperativity and added the corrected results for the actual response.

      (2) It may be helpful to restate the definition of f_m in the methods (near Eq. 3-4).

      Thank you for the suggestion. We have now restated the definition of fm and fL below Eq. 3-4: “In the denominator on the right-hand side of Eq. 3, the two terms within the parentheses of exponential expression represent the methylation-dependent (fm) and ligand-dependent (fL) free energy, respectively.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      In this paper, the authors evaluate the utility of brain age derived metrics for predicting cognitive decline by performing a 'commonality' analysis in a downstream regression that enables the different contribution of different predictors to be assessed. The main conclusion is that brain age derived metrics do not explain much additional variation in cognition over and above what is already explained by age. The authors propose to use a regression model trained to predict cognition ('brain cognition') as an alternative suited to applications of cognitive decline. While this is less accurate overall than brain age, it explains more unique variance in the downstream regression.  

      Importantly, in this revision, we clarified that we did not intend to use Brain Cognition as an alternative approach. This is because, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Here we made this point more explicit and further stated that the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. By examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. 

      REVISED VERSION: while the authors have partially addressed my concerns, I do not feel they have addressed them all. I do not feel they have addressed the weight instability and concerns about the stacked regression models satisfactorily.

      Please see our responses to Reviewer #1 Public Review #3 below

      I also must say that I agree with Reviewer 3 about the limitations of the brain age and brain cognition methods conceptually. In particular that the regression model used to predict fluid cognition will by construction explain more variance in cognition than a brain age model that is trained to predict age. This suffers from the same problem the authors raise with brain age and would indeed disappear if the authors had a separate measure of cognition against which to validate and were then to regress this out as they do for age correction. I am aware that these conceptual problems are more widespread than this paper alone (in fact throughout the brain age literature), so I do not believe the authors should be penalised for that. However, I do think they can make these concerns more explicit and further tone down the comments they make about the utility of brain cognition. I have indicated the main considerations about these points in the recommendations section below. 

      Thank you so much for raising this point. We now have the following statement in the introduction and discussion to address this concern (see below). 

      Briefly, we made it explicit that, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. That is, the relationship between Brain Cognition and fluid cognition indicates the upper limit of Brain Age’s capability in capturing fluid cognition. More importantly, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age. And this is the third goal of this present study. 

      From Introduction:

      “Third and finally, certain variation in fluid cognition is related to brain MRI, but to what extent does Brain Age not capture this variation? To estimate the variation in fluid cognition that is related to the brain MRI, we could build prediction models that directly predict fluid cognition (i.e., as opposed to chronological age) from brain MRI data. Previous studies found reasonable predictive performances of these cognition-prediction models, built from certain MRI modalities (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). Analogous to Brain Age, we called the predicted values from these cognition-prediction models, Brain Cognition. The strength of an out-of-sample relationship between Brain Cognition and fluid cognition reflects variation in fluid cognition that is related to the brain MRI and, therefore, indicates the upper limit of Brain Age’s capability in capturing fluid cognition. This is, by design, the variation in fluid cognition explained by Brain Cognition should be higher or equal to that explained by Brain Age. Consequently, if we included Brain Cognition, Brain Age and chronological age in the same model to explain fluid cognition, we would be able to examine the unique effects of Brain Cognition that explain fluid cognition beyond Brain Age and chronological age. These unique effects of Brain Cognition, in turn, would indicate the amount of co-variation between brain MRI and fluid cognition that is missed by Brain Age.”

      From Discussion:

      “Third, by introducing Brain Cognition,  we showed the extent to which Brain Age indices were not able to capture the variation in fluid cognition that is related to brain MRI. More specifically, using Brain Cognition allowed us to gauge the variation in fluid cognition that is related to the brain MRI, and thereby, to estimate the upper limit of what Brain Age can do. Moreover, by examining what was captured by Brain Cognition, over and above Brain Age and chronological age via the unique effects of Brain Cognition, we were able to quantify the amount of co-variation between brain MRI and fluid cognition that was missed by Brain Age.

      From our results, Brain Cognition, especially from certain cognition-prediction models such as the stacked models, has relatively good predictive performance, consistent with previous studies (Dubois et al., 2018; Pat, Wang, Anney, et al., 2022; Rasero et al., 2021; Sripada et al., 2020; Tetereva et al., 2022; for review, see Vieira et al., 2022). We then examined Brain Cognition using commonality analyses (Nimon et al., 2008) in multiple regression models having a Brain Age index, chronological age and Brain Cognition as regressors to explain fluid cognition. Similar to Brain Age indices, Brain Cognition exhibited large common effects with chronological age. But more importantly, unlike Brain Age indices, Brain Cognition showed large unique effects, up to around 11%. As explained above, the unique effects of Brain Cognition indicated the amount of co-variation between brain MRI and fluid cognition that was missed by a Brain Age index and chronological age. This missing amount was relatively high, considering that Brain Age and chronological age together explained around 32% of the total variation in fluid cognition. Accordingly, if a Brain Age index was used as a biomarker along with chronological age, we would have missed an opportunity to improve the performance of the model by around one-third of the variation explained.” 

      This is a reasonably good paper and the use of a commonality analysis is a nice contribution to understanding variance partitioning across different covariates. I have some comments that I believe the authors ought to address, which mostly relate to clarity and interpretation 

      Reviewer #1 Public Review #1

      First, from a conceptual point of view, the authors focus exclusively on cognition as a downstream outcome. I would suggest the authors nuance their discussion to provide broader considerations of the utility of their method and on the limits of interpretation of brain age models more generally. 

      Thank you for your comments on this issue. 

      We now discussed the broader consideration in detail:

      (1) the consistency between our findings on fluid cognition and other recent works on brain disorders, 

      (2) the difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021)

      and 

      (3) suggested solutions we and others made to optimise the utility of Brain Age for both cognitive functioning and brain disorders.

      From Discussion:

      “This discrepancy between the predictive performance of age-prediction models and the utility of Brain Age indices as a biomarker is consistent with recent findings (for review, see Jirsaraie, Gorelik, et al., 2023), both in the context of cognitive functioning (Jirsaraie, Kaufmann, et al., 2023) and neurological/psychological disorders (Bashyam et al., 2020; Rokicki et al., 2021). For instance,  combining different MRI modalities into the prediction models, similar to our stacked models, ocen leads to the highest performance of age prediction models, but does not likely explain the highest variance across different phenotypes, including cognitive functioning and beyond (Jirsaraie, Gorelik, et al., 2023).”

      “There is a notable difference between studies investigating the utility of Brain Age in explaining cognitive functioning, including ours and others (e.g., Butler et al., 2021; Cole, 2020, 2020; Jirsaraie, Kaufmann, et al., 2023) and those explaining neurological/psychological disorders (e.g., Bashyam et al., 2020; Rokicki et al., 2021). We consider the former as a normative type of study and the lader as a case-control type of study (Insel et al., 2010; Marquand et al., 2016). Those case-control Brain Age studies focusing on neurological/psychological disorders often build age-prediction models from MRI data of largely healthy participants (e.g., controls in a case-control design or large samples in a population-based design), apply the built age-prediction models to participants without vs. with neurological/psychological disorders and compare Brain Age indices between the two groups. On the one hand, this means that case-control studies treat Brain Age as a method to detect anomalies in the neurological/psychological group (Hahn et al., 2021). On the other hand, this also means that case-control studies have to ignore underfided models when applied prediction models built from largely healthy participants to participants with neurological/psychological disorders (i.e., Brain Age may predict chronological age well for the controls, but not for those with a disorder). On the contrary, our study and other normative studies focusing on cognitive functioning often build age prediction models from MRI data of largely healthy participants and apply the built age prediction models to participants who are also largely healthy. Accordingly, the age prediction models for explaining cognitive functioning in normative studies, while not allowing us to detect group-level anomalies, do not suffer from being under-fided. This unfortunately might limit the generalisability of our study into just the normative type of study. Future work is still needed to test the utility of brain age in the case-control case.”

      “Next, researchers should not select age-prediction models based solely on age-prediction performance. Instead, researchers could select age-prediction models that explained phenotypes of interest the best. Here we selected age-prediction models based on a set of features (i.e., modalities) of brain MRI. This strategy was found effective not only for fluid cognition as we demonstrated here, but also for neurological and psychological disorders as shown elsewhere (Jirsaraie, Gorelik, et al., 2023; Rokicki et al., 2021). Rokicki and colleagues (2021), for instance, found that, while integrating across MRI modalities led to age prediction models with the highest age-prediction performance, using only T1 structural MRI gave age-prediction models that were better at classifying Alzheimer’s disease. Similarly, using only cerebral blood flow gave age-prediction models that were better at classifying mild/subjective cognitive impairment, schizophrenia and bipolar disorder. 

      As opposed to selecting age-prediction models based on a set of features, researchers could also select age-prediction models based on modelling methods. For instance, Jirsaraie and colleagues (2023) compared gradient tree boosting (GTB) and deep-learning brain network (DBN) algorithms in building age-prediction models. They found GTB to have higher age prediction performance but DBN to have better utility in explaining cognitive functioning. In this case, an algorithm with better utility (e.g., DBN) should be used for explaining a phenotype of interest. Similarly, Bashyam and colleagues (2020) built different DBN-based age-prediction models, varying in age-prediction performance. The DBN models with a higher number of epochs corresponded to higher age-prediction performance. However, DBN-based age-prediction models with a moderate (as opposed to higher or lower) number of epochs were better at classifying Alzheimer’s disease, mild cognitive impairment and schizophrenia. In this case, a model from the same algorithm with better utility (e.g., those DBN with a moderate epoch number) should be used for explaining a phenotype of interest.

      Accordingly, this calls for a change in research practice, as recently pointed out by Jirasarie and colleagues (2023, p7), “Despite mounting evidence, there is a persisting assumption across several studies that the most accurate brain age models will have the most potential for detecting differences in a given phenotype of interest”. Future neuroimaging research should aim to build age-prediction models that are not necessarily good at predicting age, but at capturing phenotypes of interest.”

      Reviewer #1 Public Review #2

      Second, from a methods perspective, there is not a sufficient explanation of the methodological procedures in the current manuscript to fully understand how the stacked regression models were constructed. I would request that the authors provide more information to enable the reader to beUer understand the stacked regression models used to ensure that these models are not overfit. 

      Thank you for allowing us an opportunity to clarify our stacked model. We made additional clarification to make this clearer (see below). We wanted to confirm that we did not use test sets to build a stacked model in both lower and higher levels of the Elastic Net models. Test sets were there just for testing the performance of the models.  

      From Methods:

      “We used nested cross-validation (CV) to build these prediction models (see Figure 7). We first split the data into five outer folds, leaving each outer fold with around 100 participants. This number of participants in each fold is to ensure the stability of the test performance across folds. In each outer-fold CV loop, one of the outer folds was treated as an outer-fold test set, and the rest was treated as an outer-fold training set. Ultimately, looping through the nested CV resulted in a) prediction models from each of the 18 sets of features as well as b) prediction models that drew information across different combinations of the 18 separate sets, known as “stacked models.” We specified eight stacked models: “All” (i.e., including all 18 sets of features),  “All excluding Task FC”, “All excluding Task Contrast”, “Non-Task” (i.e., including only Rest FC and sMRI), “Resting and Task FC”, “Task Contrast and FC”, “Task Contrast” and “Task FC”. Accordingly, there were 26 prediction models in total for both Brain Age and Brain Cognition.

      To create these 26 prediction models, we applied three steps for each outer-fold loop. The first step aimed at tuning prediction models for each of 18 sets of features. This step only involved the outer-fold training set and did not involve the outer-fold test set. Here, we divided the outer-fold training set into five inner folds and applied inner-fold CV to tune hyperparameters with grid search. Specifically, in each inner-fold CV, one of the inner folds was treated as an inner-fold validation set, and the rest was treated as an inner-fold training set. Within each inner-fold CV loop, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters and applied the estimated model to the inner-fold validation set. Acer looping through the inner-fold CV, we, then, chose the prediction models that led to the highest performance, reflected by coefficient of determination (R2), on average across the inner-fold validation sets. This led to 18 tuned models, one for each of the 18 sets of features, for each outer fold.

      The second step aimed at tuning stacked models. Same as the first step, the second step only involved the outer-fold training set and did not involve the outer-fold test set. Here, using the same outer-fold training set as the first step, we applied tuned models, created from the first step, one from each of the 18 sets of features, resulting in 18 predicted values for each participant. We, then, re-divided this outer-fold training set into new five inner folds. In each inner fold, we treated different combinations of the 18 predicted values from separate sets of features as features to predict the targets in separate “stacked” models. Same as the first step, in each inner-fold CV loop, we treated one out of five inner folds as an inner-fold validation set, and the rest as an inner-fold training set. Also as in the first step, we used the inner-fold training set to estimate parameters of the prediction model with a particular set of hyperparameters from our grid. We tuned the hyperparameters of stacked models using grid search by selecting the models with the highest R2 on average across the inner-fold validation sets. This led to eight tuned stacked models.

      The third step aimed at testing the predictive performance of the 18 tuned prediction models from each of the set of features, built from the first step, and eight tuned stacked models, built from the second step. Unlike the first two steps, here we applied the already tuned models to the outer-fold test set. We started by applying the 18 tuned prediction models from each of the sets of features to each observation in the outer-fold test set, resulting in 18 predicted values. We then applied the tuned stacked models to these predicted values from separate sets of features, resulting in eight predicted values. 

      To demonstrate the predictive performance, we assessed the similarity between the observed values and the predicted values of each model across outer-fold test sets, using Pearson’s r, coefficient of determination (R2) and mean absolute error (MAE). Note that for R2, we used the sum of squares definition (i.e., R2 \= 1 – (sum of squares residuals/total sum of squares)) per a previous recommendation (Poldrack et al., 2020). We considered the predicted values from the outer-fold test sets of models predicting age or fluid cognition, as Brain Age and Brain Cognition, respectively.”

      Author response image 1.

      Diagram of the nested cross-validation used for creating predictions for models of each set of features as well as predictions for stacked models. 

      Note some previous research, including ours (Tetereva et al., 2022), splits the observations in the outer-fold training set into layer 1 and layer 2 and applies the first and second steps to layers 1 and 2, respectively. Here we decided against this approach and used the same outer-fold training set for both first and second steps in order to avoid potential bias toward the stacked models. This is because, when the data are split into two layers, predictive models built for each separate set of features only use the data from layer 1, while the stacked models use the data from both layers 1 and 2. In practice with large enough data, these two approaches might not differ much, as we demonstrated previously (Tetereva et al., 2022).

      Reviewer #1 Public Review #3

      Please also provide an indication of the different regression strengths that were estimated across the different models and cross-validation splits. Also, how stable were the weights across splits? 

      The focus of this article is on the predictions. Still, it is informative for readers to understand how stable the feature importance (i.e., Elastic Net coefficients) is. To demonstrate the stability of feature importance, we now examined the rank stability of feature importance using Spearman’s ρ (see Figure 4). Specifically, we correlated the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, we computed 10 Spearman’s ρ for each prediction model of the same features.  We found Spearman’s ρ to be varied dramatically in both age-prediction (range\=.31-.94) and fluid cognition-prediction (range\=.16-.84) models. This means that some prediction models were much more stable in their feature importance than others. This is probably due to various factors such as a) the collinearity of features in the model, b) the number of features (e.g., 71,631 features in functional connectivity, which were further reduced to 75 PCAs, as compared to 19 features in subcortical volume based on the ASEG atlas), c) the penalisation of coefficients either with ‘Ridge’ or ‘Lasso’ methods, which resulted in reduction as a group of features or selection of a feature among correlated features, respectively, and d) the predictive performance of the models. Understanding the stability of feature importance is beyond the scope of the current article. As mentioned by Reviewer 1, “The predictions can be stable when the coefficients are not,” and we chose to focus on the prediction in the current article.   

      Author response image 2.

      Stability of feature importance (i.e., Elastic Net Coefficients) of prediction models. Each dot represents rank stability (reflected by Spearman’s ρ) in the feature importance between two prediction models of the same features, used in two different outer-fold test sets. Given that there were five outer-fold test sets, there were 10 Spearman’s ρs for each prediction model.  The numbers to the right of the plots indicate the mean of Spearman’s ρ for each prediction model.  

      Reviewer #1 Public Review #4

      Please provide more details about the task designs, MRI processing procedures that were employed on this sample in addition to the regression methods and bias correction methods used. For example, there are several different parameterisations of the elastic net, please provide equations to describe the method used here so that readers can easily determine how the regularisation parameters should be interpreted.  

      Thank you for the opportunity for us to provide more methodical details.

      First, for the task design, we included the following statements:

      From Methods:

      “HCP-A collected fMRI data from three tasks: Face Name (Sperling et al., 2001), Conditioned Approach Response Inhibition Task (CARIT) (Somerville et al., 2018) and VISual MOTOR (VISMOTOR) (Ances et al., 2009). 

      First, the Face Name task (Sperling et al., 2001) taps into episodic memory. The task had three blocks. In the encoding block [Encoding], participants were asked to memorise the names of faces shown. These faces were then shown again in the recall block [Recall] when the participants were asked if they could remember the names of the previously shown faces. There was also the distractor block [Distractor] occurring between the encoding and recall blocks. Here participants were distracted by a Go/NoGo task. We computed six contrasts for this Face Name task: [Encode], [Recall], [Distractor], [Encode vs. Distractor], [Recall vs. Distractor] and [Encode vs. Recall].

      Second, the CARIT task (Somerville et al., 2018) was adapted from the classic Go/NoGo task and taps into inhibitory control. Participants were asked to press a budon to all [Go] but not to two [NoGo] shapes. We computed three contrasts for the CARIT task: [NoGo], [Go] and [NoGo vs. Go]. 

      Third, the VISMOTOR task (Ances et al., 2009) was designed to test simple activation of the motor and visual cortices. Participants saw a checkerboard with a red square either on the lec or right. They needed to press a corresponding key to indicate the location of the red square. We computed just one contrast for the VISMOTOR task: [Vismotor], which indicates the presence of the checkerboard vs. baseline.” 

      Second, for MRI processing procedures, we included the following statements.

      From Methods:

      “HCP-A provides details of parameters for brain MRI elsewhere (Bookheimer et al., 2019; Harms et al., 2018). Here we used MRI data that were pre-processed by the HCP-A with recommended methods, including the MSMALL alignment (Glasser et al., 2016; Robinson et al., 2018) and ICA-FIX (Glasser et al., 2016) for functional MRI. We used multiple brain MRI modalities, covering task functional MRI (task fMRI), resting-state functional MRI (rsfMRI) and structural MRI (sMRI), and organised them into 19 sets of features.”

      “Sets of Features 1-10: Task fMRI contrast (Task Contrast)

      Task contrasts reflect fMRI activation relevant to events in each task. Bookheimer and colleagues (2019) provided detailed information about the fMRI in HCP-A. Here we focused on the pre-processed task fMRI Connectivity Informatics Technology Initiative (CIFTI) files with a suffix, “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” These CIFTI files encompassed both the cortical mesh surface and subcortical volume (Glasser et al., 2013). Collected using the posterior-to-anterior (PA) phase, these files were aligned using MSMALL (Glasser et al., 2016; Robinson et al., 2018), linear detrended (see hdps://groups.google.com/a/humanconnectome.org/g/hcp-users/c/ZLJc092h980/m/GiihzQAUAwAJ) and cleaned from potential artifacts using ICA-FIX (Glasser et al., 2016). 

      To extract Task Contrasts, we regressed the fMRI time series on the convolved task events using a double-gamma canonical hemodynamic response function via FMRIB Software Library (FSL)’s FMRI Expert Analysis Tool (FEAT) (Woolrich et al., 2001). We kept FSL’s default high pass cutoff at 200s (i.e., .005 Hz). We then parcellated the contrast ‘cope’ files, using the Glasser atlas (Gordon et al., 2016) for cortical surface regions and the Freesurfer’s automatic segmentation (aseg) (Fischl et al., 2002) for subcortical regions. This resulted in 379 regions, whose number was, in turn, the number of features for each Task Contrast set of features. “ 

      “Sets of Features 11-13: Task fMRI functional connectivity (Task FC)

      Task FC reflects functional connectivity (FC ) among the brain regions during each task, which is considered an important source of individual differences (Elliod et al., 2019; Fair et al., 2007; Gradon et al., 2018). We used the same CIFTI file “_PA_Atlas_MSMAll_hp0_clean.dtseries.nii.” as the task contrasts. Unlike Task Contrasts, here we treated the double-gamma, convolved task events as regressors of no interest and focused on the residuals of the regression from each task (Fair et al., 2007). We computed these regressors on FSL, and regressed them in nilearn (Abraham et al., 2014). Following previous work on task FC (Elliod et al., 2019), we applied a highpass at .008 Hz. For parcellation, we used the same atlases as Task Contrast (Fischl et al., 2002; Glasser et al., 2016). We computed Pearson’s correlations of each pair of 379 regions, resulting in a table of 71,631 non-overlapping FC indices for each task. We then applied r-to-z transformation and principal component analysis (PCA) of 75 components (Rasero et al., 2021; Sripada et al., 2019, 2020). Note to avoid data leakage, we conducted the PCA on each training set and applied its definition to the corresponding test set. Accordingly, there were three sets of 75 features for Task FC, one for each task. 

      Set of Features 14: Resting-state functional MRI functional connectivity (Rest FC) Similar to Task FC, Rest FC reflects functional connectivity (FC ) among the brain regions, except that Rest FC occurred during the resting (as opposed to task-performing) period. HCPA collected Rest FC from four 6.42-min (488 frames) runs across two days, leading to 26-min long data (Harms et al., 2018). On each day, the study scanned two runs of Rest FC, starting with anterior-to-posterior (AP) and then with posterior-to-anterior (PA) phase encoding polarity. We used the “rfMRI_REST_Atlas_MSMAll_hp0_clean.dscalar.nii” file that was preprocessed and concatenated across the four runs.  We applied the same computations (i.e., highpass filter, parcellation, Pearson’s correlations, r-to-z transformation and PCA) with the Task FC. 

      Sets of Features 15-18: Structural MRI (sMRI)

      sMRI reflects individual differences in brain anatomy. The HCP-A used an established preprocessing pipeline for sMRI (Glasser et al., 2013). We focused on four sets of features: cortical thickness, cortical surface area, subcortical volume and total brain volume. For cortical thickness and cortical surface area, we used Destrieux’s atlas (Destrieux et al., 2010; Fischl, 2012) from FreeSurfer’s “aparc.stats” file, resulting in 148 regions for each set of features. For subcortical volume, we used the aseg atlas (Fischl et al., 2002) from FreeSurfer’s “aseg.stats” file, resulting in 19 regions. For total brain volume, we had five FreeSurfer-based features: “FS_IntraCranial_Vol” or estimated intra-cranial volume, “FS_TotCort_GM_Vol” or total cortical grey mader volume, “FS_Tot_WM_Vol” or total cortical white mader volume, “FS_SubCort_GM_Vol” or total subcortical grey mader volume and “FS_BrainSegVol_eTIV_Ratio” or ratio of brain segmentation volume to estimated total intracranial volume.”

      Third, for regression methods and bias correction methods used, we included the following statements:

      From Methods:

      “For the machine learning algorithm, we used Elastic Net (Zou & Hastie, 2005). Elastic Net is a general form of penalised regressions (including Lasso and Ridge regression), allowing us to simultaneously draw information across different brain indices to predict one target variable. Penalised regressions are commonly used for building age-prediction models (Jirsaraie, Gorelik, et al., 2023). Previously we showed that the performance of Elastic Net in predicting cognitive abilities is on par, if not better than, many non-linear and morecomplicated algorithms (Pat, Wang, Bartonicek, et al., 2022; Tetereva et al., 2022). Moreover, Elastic Net coefficients are readily explainable, allowing us the ability to explain how our age-prediction and cognition-prediction models made the prediction from each brain feature (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022) (see below). 

      Elastic Net simultaneously minimises the weighted sum of the features’ coefficients. The degree of penalty to the sum of the feature’s coefficients is determined by a shrinkage hyperparameter ‘a’: the greater the a, the more the coefficients shrink, and the more regularised the model becomes. Elastic Net also includes another hyperparameter, ‘ℓ! ratio’, which determines the degree to which the sum of either the squared (known as ‘Ridge’; ℓ! ratio=0) or absolute (known as ‘Lasso’; ℓ! ratio=1) coefficients is penalised (Zou & Hastie, 2005). The objective function of Elastic Net as implemented by sklearn (Pedregosa et al., 2011) is defined as:

      where X is the features, y is the target, and b is the coefficient. In our grid search, we tuned two Elastic Net hyperparameters: a using 70 numbers in log space, ranging from .1 and 100, and ℓ!-ratio using 25 numbers in linear space, ranging from 0 and 1.

      To understand how Elastic Net made a prediction based on different brain features, we examined the coefficients of the tuned model. Elastic Net coefficients can be considered as feature importance, such that more positive Elastic Net coefficients lead to more positive predicted values and, similarly, more negative Elastic Net coefficients lead to more negative predicted values (Molnar, 2019; Pat, Wang, Bartonicek, et al., 2022). While the magnitude of Elastic Net coefficients is regularised (thus making it difficult for us to interpret the magnitude itself directly), we could still indicate that a brain feature with a higher magnitude weights relatively stronger in making a prediction. Another benefit of Elastic Net as a penalised regression is that the coefficients are less susceptible to collinearity among features as they have already been regularised (Dormann et al., 2013; Pat, Wang, Bartonicek, et al., 2022).

      Given that we used five-fold nested cross validation, different outer folds may have different degrees of ‘a’ and ‘ℓ! ratio’, making the final coefficients from different folds to be different. For instance, for certain sets of features, penalisation may not play a big part (i.e., higher or lower ‘a’ leads to similar predictive performance), resulting in different ‘a’ for different folds. To remedy this in the visualisation of Elastic Net feature importance, we refitted the Elastic Net model to the full dataset without spli{ng them into five folds and visualised the coefficients on brain images using Brainspace (Vos De Wael et al., 2020) and Nilern (Abraham et al., 2014) packages. Note, unlike other sets of features, Task FC and Rest FC were modelled acer data reduction via PCA. Thus, for Task FC and Rest FC, we, first, multiplied the absolute PCA scores (extracted from the ‘components_’ attribute of ‘sklearn.decomposition.PCA’) with Elastic Net coefficients and, then, summed the multiplied values across the 75 components, leaving 71,631 ROI-pair indices.

      References

      Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, J., Gramfort, A., Thirion, B., & Varoquaux, G. (2014). Machine learning for neuroimaging with scikitlearn. Frontiers in Neuroinformatics, 8, 14. hdps://doi.org/10.3389/fninf.2014.00014

      Ances, B. M., Liang, C. L., Leontiev, O., Perthen, J. E., Fleisher, A. S., Lansing, A. E., & Buxton, R. B. (2009). Effects of aging on cerebral blood flow, oxygen metabolism, and blood oxygenation level dependent responses to visual stimulation. Human Brain Mapping, 30(4), 1120–1132. hdps://doi.org/10.1002/hbm.20574

      Bashyam, V. M., Erus, G., Doshi, J., Habes, M., Nasrallah, I. M., Truelove-Hill, M., Srinivasan, D., Mamourian, L., Pomponio, R., Fan, Y., Launer, L. J., Masters, C. L., Maruff, P., Zhuo, C., Völzke, H., Johnson, S. C., Fripp, J., Koutsouleris, N., Saderthwaite, T. D., … on behalf of the ISTAGING Consortium,  the P. A. disease C., ADNI, and CARDIA studies. (2020). MRI signatures of brain age and disease over the lifespan based on a deep brain network and 14 468 individuals worldwide. Brain, 143(7), 2312–2324. hdps://doi.org/10.1093/brain/awaa160

      Bookheimer, S. Y., Salat, D. H., Terpstra, M., Ances, B. M., Barch, D. M., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Diaz-Santos, M., Elam, J. S., Fischl, B., Greve, D. N., Hagy, H. A., Harms, M. P., Hatch, O. M., Hedden, T., Hodge, C., Japardi, K. C., Kuhn, T. P., … Yacoub, E. (2019). The Lifespan Human Connectome Project in Aging: An overview. NeuroImage, 185, 335–348. hdps://doi.org/10.1016/j.neuroimage.2018.10.009

      Butler, E. R., Chen, A., Ramadan, R., Le, T. T., Ruparel, K., Moore, T. M., Saderthwaite, T. D., Zhang, F., Shou, H., Gur, R. C., Nichols, T. E., & Shinohara, R. T. (2021). Pi alls in brain age analyses. Human Brain Mapping, 42(13), 4092–4101. hdps://doi.org/10.1002/hbm.25533

      Cole, J. H. (2020). Multimodality neuroimaging brain-age in UK biobank: Relationship to biomedical, lifestyle, and cognitive factors. Neurobiology of Aging, 92, 34–42. hdps://doi.org/10.1016/j.neurobiolaging.2020.03.014

      Destrieux, C., Fischl, B., Dale, A., & Halgren, E. (2010). Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclature. NeuroImage, 53(1), 1–15. hdps://doi.org/10.1016/j.neuroimage.2010.06.010

      Dormann, C. F., Elith, J., Bacher, S., Buchmann, C., Carl, G., Carré, G., Marquéz, J. R. G., Gruber, B., Lafourcade, B., Leitão, P. J., Münkemüller, T., McClean, C., Osborne, P. E., Reineking, B., Schröder, B., Skidmore, A. K., Zurell, D., & Lautenbach, S. (2013). Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography, 36(1), 27–46. hdps://doi.org/10.1111/j.16000587.2012.07348.x

      Dubois, J., Galdi, P., Paul, L. K., & Adolphs, R. (2018). A distributed brain network predicts general intelligence from resting-state human neuroimaging data. Philosophical Transactions of the Royal Society B: Biological Sciences, 373(1756), 20170284. hdps://doi.org/10.1098/rstb.2017.0284

      Elliod, M. L., Knodt, A. R., Cooke, M., Kim, M. J., Melzer, T. R., Keenan, R., Ireland, D., Ramrakha, S., Poulton, R., Caspi, A., Moffid, T. E., & Hariri, A. R. (2019). General functional connectivity: Shared features of resting-state and task fMRI drive reliable and heritable individual differences in functional brain networks. NeuroImage, 189, 516–532. hdps://doi.org/10.1016/j.neuroimage.2019.01.068

      Fair, D. A., Schlaggar, B. L., Cohen, A. L., Miezin, F. M., Dosenbach, N. U. F., Wenger, K. K., Fox, M. D., Snyder, A. Z., Raichle, M. E., & Petersen, S. E. (2007). A method for using blocked and event-related fMRI data to study “resting state” functional connectivity. NeuroImage, 35(1), 396–405. hdps://doi.org/10.1016/j.neuroimage.2006.11.051

      Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2), 774–781. hdps://doi.org/10.1016/j.neuroimage.2012.01.021

      Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove, C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N., Rosen, B., & Dale, A. M. (2002). Whole Brain Segmentation. Neuron, 33(3), 341–355. hdps://doi.org/10.1016/S0896-6273(02)00569-X

      Glasser, M. F., Smith, S. M., Marcus, D. S., Andersson, J. L. R., Auerbach, E. J., Behrens, T. E. J., Coalson, T. S., Harms, M. P., Jenkinson, M., Moeller, S., Robinson, E. C., Sotiropoulos, S. N., Xu, J., Yacoub, E., Ugurbil, K., & Van Essen, D. C. (2016). The Human Connectome Project’s neuroimaging approach. Nature Neuroscience, 19(9), 1175– 1187. hdps://doi.org/10.1038/nn.4361

      Glasser, M. F., Sotiropoulos, S. N., Wilson, J. A., Coalson, T. S., Fischl, B., Andersson, J. L., Xu, J., Jbabdi, S., Webster, M., Polimeni, J. R., Van Essen, D. C., & Jenkinson, M. (2013). The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage, 80, 105–124. hdps://doi.org/10.1016/j.neuroimage.2013.04.127

      Gordon, E. M., Laumann, T. O., Adeyemo, B., Huckins, J. F., Kelley, W. M., & Petersen, S. E. (2016). Generation and Evaluation of a Cortical Area Parcellation from Resting-State Correlations. Cerebral Cortex, 26(1), 288–303. hdps://doi.org/10.1093/cercor/bhu239

      Gradon, C., Laumann, T. O., Nielsen, A. N., Greene, D. J., Gordon, E. M., Gilmore, A. W., Nelson, S. M., Coalson, R. S., Snyder, A. Z., Schlaggar, B. L., Dosenbach, N. U. F., & Petersen, S. E. (2018). Functional Brain Networks Are Dominated by Stable Group and Individual Factors, Not Cognitive or Daily Variation. Neuron, 98(2), 439-452.e5. hdps://doi.org/10.1016/j.neuron.2018.03.035

      Hahn, T., Fisch, L., Ernsting, J., Winter, N. R., Leenings, R., Sarink, K., Emden, D., Kircher, T., Berger, K., & Dannlowski, U. (2021). From ‘loose fi{ng’ to high-performance, uncertainty-aware brain-age modelling. Brain, 144(3), e31–e31. hdps://doi.org/10.1093/brain/awaa454

      Harms, M. P., Somerville, L. H., Ances, B. M., Andersson, J., Barch, D. M., Bastiani, M., Bookheimer, S. Y., Brown, T. B., Buckner, R. L., Burgess, G. C., Coalson, T. S., Chappell, M. A., Dapredo, M., Douaud, G., Fischl, B., Glasser, M. F., Greve, D. N., Hodge, C., Jamison, K. W., … Yacoub, E. (2018). Extending the Human Connectome Project across ages: Imaging protocols for the Lifespan Development and Aging projects. NeuroImage, 183, 972–984. hdps://doi.org/10.1016/j.neuroimage.2018.09.060

      Insel, T., Cuthbert, B., Garvey, M., Heinssen, R., Pine, D. S., Quinn, K., Sanislow, C., & Wang, P. (2010). Research Domain Criteria (RDoC): Toward a New Classification Framework for Research on Mental Disorders. American Journal of Psychiatry, 167(7), 748–751. hdps://doi.org/10.1176/appi.ajp.2010.09091379

      Jirsaraie, R. J., Gorelik, A. J., Gatavins, M. M., Engemann, D. A., Bogdan, R., Barch, D. M., & Sotiras, A. (2023). A systematic review of multimodal brain age studies: Uncovering a divergence between model accuracy and utility. PaUerns, 4(4), 100712. hdps://doi.org/10.1016/j.pader.2023.100712

      Jirsaraie, R. J., Kaufmann, T., Bashyam, V., Erus, G., Luby, J. L., Westlye, L. T., Davatzikos, C., Barch, D. M., & Sotiras, A. (2023). Benchmarking the generalizability of brain age models: Challenges posed by scanner variance and prediction bias. Human Brain Mapping, 44(3), 1118–1128. hdps://doi.org/10.1002/hbm.26144

      Marquand, A. F., Rezek, I., Buitelaar, J., & Beckmann, C. F. (2016). Understanding Heterogeneity in Clinical Cohorts Using Normative Models: Beyond Case-Control Studies. Biological Psychiatry, 80(7), 552–561. hdps://doi.org/10.1016/j.biopsych.2015.12.023

      Molnar, C. (2019). Interpretable Machine Learning. A Guide for Making Black Box Models Explainable. hdps://christophm.github.io/interpretable-ml-book/

      Nimon, K., Lewis, M., Kane, R., & Haynes, R. M. (2008). An R package to compute commonality coefficients in the multiple regression case: An introduction to the package and a practical example. Behavior Research Methods, 40(2), 457–466. hdps://doi.org/10.3758/BRM.40.2.457

      Pat, N., Wang, Y., Anney, R., Riglin, L., Thapar, A., & Stringaris, A. (2022). Longitudinally stable, brain-based predictive models mediate the relationships between childhood cognition and socio-demographic, psychological and genetic factors. Human Brain Mapping, hbm.26027. hdps://doi.org/10.1002/hbm.26027

      Pat, N., Wang, Y., Bartonicek, A., Candia, J., & Stringaris, A. (2022). Explainable machine learning approach to predict and explain the relationship between task-based fMRI and individual differences in cognition. Cerebral Cortex, bhac235. hdps://doi.org/10.1093/cercor/bhac235

      Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Predenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

      Poldrack, R. A., Huckins, G., & Varoquaux, G. (2020). Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry, 77(5), 534–540. hdps://doi.org/10.1001/jamapsychiatry.2019.3671

      Rasero, J., Sentis, A. I., Yeh, F.-C., & Verstynen, T. (2021). Integrating across neuroimaging modalities boosts prediction accuracy of cognitive ability. PLOS Computational Biology, 17(3), e1008347. hdps://doi.org/10.1371/journal.pcbi.1008347

      Robinson, E. C., Garcia, K., Glasser, M. F., Chen, Z., Coalson, T. S., Makropoulos, A., Bozek, J., Wright, R., Schuh, A., Webster, M., Huder, J., Price, A., Cordero Grande, L., Hughes, E., Tusor, N., Bayly, P. V., Van Essen, D. C., Smith, S. M., Edwards, A. D., … Rueckert, D. (2018). Multimodal surface matching with higher-order smoothness constraints. NeuroImage, 167, 453–465. hdps://doi.org/10.1016/j.neuroimage.2017.10.037

      Rokicki, J., Wolfers, T., Nordhøy, W., Tesli, N., Quintana, D. S., Alnæs, D., Richard, G., de Lange, A.-M. G., Lund, M. J., Norbom, L., Agartz, I., Melle, I., Nærland, T., Selbæk, G., Persson, K., Nordvik, J. E., Schwarz, E., Andreassen, O. A., Kaufmann, T., & Westlye, L. T. (2021). Multimodal imaging improves brain age prediction and reveals distinct abnormalities in patients with psychiatric and neurological disorders. Human Brain Mapping, 42(6), 1714–1726. hdps://doi.org/10.1002/hbm.25323

      Somerville, L. H., Bookheimer, S. Y., Buckner, R. L., Burgess, G. C., Curtiss, S. W., Dapredo, M., Elam, J. S., Gaffrey, M. S., Harms, M. P., Hodge, C., Kandala, S., Kastman, E. K., Nichols, T. E., Schlaggar, B. L., Smith, S. M., Thomas, K. M., Yacoub, E., Van Essen, D. C., & Barch, D. M. (2018). The Lifespan Human Connectome Project in Development: A large-scale study of brain connectivity development in 5–21 year olds. NeuroImage, 183, 456–468. hdps://doi.org/10.1016/j.neuroimage.2018.08.050

      Sperling, R. A., Bates, J. F., Cocchiarella, A. J., Schacter, D. L., Rosen, B. R., & Albert, M. S. (2001). Encoding novel face-name associations: A functional MRI study. Human Brain Mapping, 14(3), 129–139. hdps://doi.org/10.1002/hbm.1047

      Sripada, C., Angstadt, M., Rutherford, S., Kessler, D., Kim, Y., Yee, M., & Levina, E. (2019). Basic Units of Inter-Individual Variation in Resting State Connectomes. Scientific Reports, 9(1), Article 1. hdps://doi.org/10.1038/s41598-018-38406-5

      Sripada, C., Angstadt, M., Rutherford, S., Taxali, A., & Shedden, K. (2020). Toward a “treadmill test” for cognition: Improved prediction of general cognitive ability from the task activated brain. Human Brain Mapping, 41(12), 3186–3197. hdps://doi.org/10.1002/hbm.25007

      Tetereva, A., Li, J., Deng, J. D., Stringaris, A., & Pat, N. (2022). Capturing brain-cognition relationship: Integrating task-based fMRI across tasks markedly boosts prediction and test-retest reliability. NeuroImage, 263, 119588. hdps://doi.org/10.1016/j.neuroimage.2022.119588

      Vieira, B. H., Pamplona, G. S. P., Fachinello, K., Silva, A. K., Foss, M. P., & Salmon, C. E. G. (2022). On the prediction of human intelligence from neuroimaging: A systematic review of methods and reporting. Intelligence, 93, 101654. hdps://doi.org/10.1016/j.intell.2022.101654

      Vos De Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Communications Biology, 3(1), 103. hdps://doi.org/10.1038/s42003-020-0794-7

      Woolrich, M. W., Ripley, B. D., Brady, M., & Smith, S. M. (2001). Temporal Autocorrelation in Univariate Linear Modeling of FMRI Data. NeuroImage, 14(6), 1370–1386. hdps://doi.org/10.1006/nimg.2001.0931

      Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. hdps://doi.org/10.1111/j.1467-9868.2005.00503.x

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      This study explores the role of one the most abundant circRNAs, circHIPK3, in bladder cancer cells, providing convincing data that circHIPK3 depletion affects thousands of genes and that those downregulated (including STAT3) share an 11-mer motif with circHIPK3, corresponding to a binding site for IGF2BP2. The experiments demonstrate that circHIPK3 can compete with the downregulated mRNAs targets for IGF2BP2 binding and that IGF2BP2 depletion antagonizes the effect of circHIPK3 depletion by upregulating the genes containing the 11mer motif. These valuable findings contribute to the growing recognition of the complexity of cancer signaling regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3. They demonstrate that circHIPK3 interacts with an RNA binding protein (IGF2BP2), sequestering it away from its target mRNAs. This interaction is shown to regulates the expression of hundreds of genes that share a specific sequence motif (11-mer motif) in their untranslated regions (3'-UTR), identical to one present in circHIPK3 where IGF2BP2 binds. The study further focuses on the specific case of STAT3 gene, whose mRNA product is found to be downregulated upon circHIPK3 depletion. This suggests that circHIPK3 sequesters IGF2BP2, preventing it from binding to and destabilizing STAT3 mRNA. The study presents evidence supporting this mechanism and discusses its potential role in tumor cell progression. These findings contribute to the growing complexity of understanding cancer regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis.

      Strengths:

      The authors show mechanistic insight into a proposed novel "sponging" function of

      circHIPK3 which is not mediated by sequestering miRNAs but rather a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of studies. They provide both genome-wide analysis and a specific case (STAT3) which is relevant for cancer progression. Overall, the authors have significantly improved their manuscript in their revised version.

      Weaknesses:

      While the authors have performed northern blots to measure circRNA levels, an estimation of the circRNA overexpression efficiency, namely the circular-to-linear expression ratio, would be desired. The seemingly contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, are now addressed by the authors in their revised manuscript, incorporating potential reasons that might explain such complexity.

      We have now included a full version of the northern blot, where no discernible linear precursor can be detected, supporting efficient circHIPK3 WT and circHIPK3 MUT production (please see the detailed description in the specific comments below). We agree that the observations about STAT3 homeostasis and cancer progression, is not a straightforward extrapolation as discussed. 

      Reviewer #2 (Public Review):

      Summary: 

      The authors have diligently addressed most of the points raised during the review process (except the important point of "additional in vitro experiments [...] needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype" for which no additional experiments were performed), resulting in an improvement in the study. The data are now described with clarity and conciseness, enhancing the overall quality of the manuscript. 

      Strengths: 

      New, well-defined molecular mechanism of circRNAs involvement in bladder cancer. 

      Weaknesses: 

      Lack of solid translational significance data. 

      The focus of this study has been to disclose molecular mechanisms of action by circHIPK3, with implications for cancer. We agree that further studies are needed to fully understand the impact of circHIPK3 in bladder cancer.  

      Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking down circHIPK3 and performing an RNA-seq analysis, the authors found thousands of deregulated genes which look unaffected by miRNAs sponging function and that are, instead, enriched for a 11-mer motif. Further investigations showed that the 11mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD, resulting in downregulation and upregulation respectively, the authors found the STAT3 gene, which also consistently has concomitant upregulation of one of its targets TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation. 

      Strengths: 

      Although the number of circRNAs continues to grow, this field lacks many instances of detailed molecular investigations. The presented work critically addresses some of the major piaalls in the field of circRNAs, and there has been a careful analysis of aspects frequently poorly investigated. Experiments involving use of time-point knockdown followed by RNAseq, investigation of miRNA-sponge function of circHIPK3, identification of 11-mer motif, identification and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action are thorough and convincing. 

      Weaknesses: 

      It is unclear why the authors used certain bladder cancer cells versus non-bladder cells in some experiments. The efficacy of certain experiments (specifically rescue experiments) and some control conditions is still questionable. Overall, the presented study adds some further knowledge in describing circHIPK3 function, its capability to regulate some downstream genes, and its interaction and competition for IGF2BP2. 

      We have provided a discussion and argumentation of how certain bladder cancer cells (and non-bladder cancer cells) have been used in this study in our previous rebuttal letter and also clarified this further in the materials and methods section in the first revision. Regarding control conditions for experiments, we believe we have included all necessary controls and explanations for these in the revised version (please see the detailed description in the specific comments below). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points about revised manuscript

      (1) In Supplementary Figure S5H, the membrane may have been trimmed too closely to the circRNA band, potentially resulting in the absence of the linear RNA band. Could the authors provide a full image of the membrane that includes the loading points? Having access to the complete image would allow for a more comprehensive evaluation of the results, including the presence or absence of expected linear and circular RNA bands.

      I have taken the liberty to move this “major point” from the public review section, as I believe it would be too detailed for this section. We have included the full section of the northern blot, according to the reviewers recommendations. 

      As described in the previous rebuttal letter our northern blots suffered from heavy background signal arising from the rRNA bands, which was the reason for cuttng the northern blot in the previous version of Supplementary figure S5H. We have now shown the entire blot as suggested by the reviewer, so that the reader can more clearly inspect any potential linear precursor band. We previously stated that we could not assess the circular-to-linear ratio due to background signal, since a potential linear HIPK3 precursor RNA could be masked by the rRNA signal. However, the theoretical size of a linear precursor is ~2.9 kb – a region where we do not detect any distinct bands (just above the 18S band), making a rather efficient circularization very likely. In support of this claim, we are using the Laccase2 vector described in Kramer et al, 2015 (Genes dev), which is proven to produce high levels of circHIPK3 compared to negligable amounts of linear precursor (although in a different cell line). We have also included a 5.8S rRNA probe to control for loading and RNase R activity (can also be ascertained by the disappearence of 18S/28S bands). Since we do not have the option to use another probe (limited by the BSJ-specific probe) and it is not practical to deplete for rRNA from 20 µg samples of total RNA, prior to running the northern blot, we find that this data sufficiently proves that our vector constructs produce a decent amount of RNase R-resistant circHIPK3, with no visible/discernible linear precursor.    

      Minor points about revised manuscript

      (1) In Supplementary Figure S3B, the authors offer no explanation as to why genes that become upregulated upon circHIPK3 knockdown generally contain more circHIPK3-RBP binding sites other than for IGF2BP2. A clarification would be of help.

      Again, this issue has been addressed in the previous rebuttal letter. Our response is repeated below:

      We do not have any evidence to explain this observation. One possibility is that other RBPs elicit mRNA-stabilizing effects on average, whereas abundant IGF2BP2 (~ 120.000200.000 copies per cell) now able to bind more target mRNAs and elicit destabilization. This remains highly speculative though.

      (2) In Supplementary Figure S3D, the authors' claim that the 11-mer motif is found more bound to IGF2BP2 than for other circHIPK3-RBPs should referred to the corresponding dataset/reference.

      Again, this issue has been addressed in the previous rebuttal letter. Our response is repeated below:

      This information is stated in the figure legend (K562) and we have now included it in the main text as well: “We evaluated how often binding sites of circHIPK3-RBPs overlap the 11-mer motif and found that this is more often the case for IGF2BP2 binding sites than binding sites of the other circHIPK3-RBPs when scrutinizing K562 datasets (Supplementary Figure S3D)”.

      (3) In the rescue experiment where both circHIPK3 and IGF2BP2 are downregulated, using the term "normalization" to mean reestablishing normal levels of gene expression can lead to confusion with the concept of normalization as it is commonly understood in the context of data processing (i.e. the mathematical process of adjusting data to account for various factors that might affect measurements). I would recommend the authors to use a term that more specifically describes the biological process they are referring to, such as "restoration of normal expression levels" or simply "return to normal levels".

      We agree that this term could be misunderstood. This has now been changed as recommended.

      (4) The figure legend of Supplementary Figure 5F is wrongly labeled. The legend for panel F actually corresponds to panel G and vice versa. 

      This has now been corrected.  

      Reviewer #2 (Recommendations For The Authors): 

      The authors have diligently addressed most of the points raised during the review process (except the important point of "additional in vitro experiments [...] needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype" for which no additional experiments were performed), resulting in an improvement in the study. The data are now described with clarity and conciseness, enhancing the overall quality of the manuscript. Therefore, I support the publication of this work. 

      We thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors): 

      Please ensure that when the changes are made (especially for major points) by addressing the reviewer's comments, these are all appropriately incorporated in the text (for example the use of Act B as a low affinity positive control (now in Fig 4A), is not explained in the text neither the legends/methods) 

      This has now been included.

      Please ensure that all the legends correspond to the right figures (eg: Supplementary Figure with rescue experiment is 5F, but the corresponding legend in the manuscript is the S5G). 

      This has now been corrected.

      Please for future reviewing processes ensure the new parts are properly highlighted or coloured differently in the manuscript

      This has now been done more thoroughly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Below, we provide a detailed account of the changes we made. For clarity and ease of review:

      •        Original reviewers' comments are included and highlighted in grey

      •        Our responses to each comment are written in black text

      •        Print screens illustrating the specific changes made to the manuscript are enclosed within black squares

      eLife assessment

      The authors aim to develop a CRISPR system that can be activated upon sensing an RNA. As an initial step to this goal, they describe RNA-sensing guide RNAs for controlled activation of CRISPR modification. Many of the data look convincing and while several steps remain to achieve the stated goal in an in vivo setting and for robust activation by endogenous RNAs, the current work will be important for many in the field.  

      The eLife assessment summarises our ambition to create a CRISPR system controlled by RNA sensing. The synopsis provided encapsulates the essence of our research, emphasising both the progress we have made and the challenges that lie ahead. This assessment fully resonates with our views.

      Public Reviews:

      Reviewer #1 (Public Review):

      This paper describes RNA-sensing guide RNAs for controlled activation of CRISPR modification. This works by having an extended guide RNA with a sequence that folds back onto the targeting sequence such that the guide RNA cannot hybridise to its genomic target. The CRISPR is "activated" by the introduction of another RNA, referred to as a trigger, that competes with this "back folding" to make the guide RNA available for genome targeting. The authors first confirm the efficacy of the approach using several RNA triggers and a GFP reporter that is activated by dCas9 fused to transcriptional activators. A major potential application of this technique is the activation of CRISPR in response to endogenous biomarkers. As these will typically be longer than the first generation triggers employed by the authors they test some extended triggers, which also work though not always to the same extent. They then introduce MODesign which may enable the design of bespoke or improved triggers. After that, they determine that the mode of activation by the RNA trigger involves cleavage of the RNA complexes. Finally, they test the potential for their system to work in a developmental setting - specifically zebrafish embryos. There is some encouraging evidence, though the effects appear more subtle than those originally obtained in cell culture. 

      Overall, the potential of a CRISPR system that can be activated upon sensing an RNA is high and there are a myriad of opportunities and applications for it. This paper represents a reasonable starting point having developed such a system in principle. 

      The weakness of the study is that it does not demonstrate that the system can be used in a completely natural setting. This would require an endogenous transcript as the RNA trigger with a clear readout. Such an experiment would clearly strengthen the paper and provide strong confidence that the method could be employed for one of the major applications discussed by the authors. The zebrafish data relied on exogenous RNA triggers whereas the major applications (as I understood them) would use endogenous triggers. 

      Related, most endogenous RNAs are longer than the various triggers tested and may require extensive modification of the system to be detected or utilised effectively. 

      While additional data would clearly be beneficial, there should nevertheless be a more detailed discussion of these caveats and/or the strengths and applications of the system as it is presented (i.e. utility with synthetic triggers).  

      We agree with the observation regarding the subtler effects in the zebrafish embryos and the reliance on exogenous RNA triggers. Indeed, the utilisation of endogenous transcripts as triggers in a natural setting is a logical next step. We further acknowledge the need to delve deeper into the complexities and challenges of our system, particularly concerning the detection of endogenous RNA, thus offering valuable insights for researchers looking to adapt our system for various applications. In order to clarify these limitations, we made some changes in the final version of our paper. The following paragraphs have been therefore included in the manuscript discussion:

      “In their current iteration, iSBH-sgRNAs show considerable promise for mammalian synthetic biology applications. Specifically, their ability to detect synthetic triggers could be pivotal in the development of complex synthetic RNA circuits and logic gates, thereby advancing the field of cellular reprogramming. However, further work is required to achieve better ON/OFF activation ratios in vivo and more homogeneous activity across tissues in the presence of RNA triggers. Additional chemical modifications could improve iSBH-sgRNA properties, and we believe that chemical modification strategies adopted for siRNA drugs or antisense oligos (Khvorova and Watts (2017)) could also be essential for further iSBH-sgRNA technology development. As iSBH-sgRNAs might be targeted by endogenous nucleases, leading to their degradation, a strategy for preventing this could involve additional chemical modifications. When inserted at certain key positions, such modifications could prevent interaction between iSBH-sgRNAs and cellular enzymes by introducing steric clashes or inhibiting RNA hydrolysis.

      Once achieving superior dynamic ranges of iSBH-sgRNA activation in vivo, the next steps would involve understanding the classes of endogenous RNAs that could act as triggers. The chances that an iSBH-sgRNA encounters an endogenous RNA trigger inside a cell would depend on the relative concentrations of the two RNA species. Therefore, a first step towards determining potential endogenous RNA triggers will involve identifying RNA species with comparable expression levels as iSBH-sgRNAs. Then, iSBH-sgRNAs could be designed against these RNA species, followed by experimental validation. It is important to note that eukaryotic cells express a wide range of transcripts of varying sizes, expression levels, and subcellular localisations, all of which could greatly affect iSBH-sgRNA activation levels. Based on the data presented here, we speculate that RNA species up to 300nt that are also highly expressed might act as good triggers. Furthermore, as sgRNAs are involved in targeting Cas9 to genomic DNA in the nucleus, attempting to detect transcripts that are sequestered in the nucleus might also provide additional benefit.”

      Reviewer #3 (Public Review):

      In this work, the authors describe engineering of sgRNAs that render Cas9 DNA binding controllable by a second RNA trigger. The authors introduce several iterations of their engineered sgRNAs, as well as a computational pipeline to identify designs for user-specified RNA triggers which offers a helpful alternative to purely rational design. Also included is an investigation of the fate of the engineered sgRNAs when introduced into cells, and the use of this information to inform installation of modified nucleotides to improve engineered sgRNA stability. Engineered sgRNAs are demonstrated to be activated by trigger RNAs in both cultured mammalian cells and zebrafish. 

      The conclusions made by the authors in this work are predominantly supported by the data provided. However, some claims are not consistent with the data shown and some of the figures would benefit from revision or further clarification. 

      Strengths: 

      - The sgRNA engineering in this paper is performed and presented in a systematic and logical fashion.

      - Inclusion of a computational method to predict iSBH-sgRNAs adds to the strength of the engineering. 

      - Investigation into the cellular fate of the engineered sgRNAs and the use of this information to guide inclusion of chemically modified nucleotides is also a strength. 

      - Demonstration of activity in both cultured mammalian cells and in zebrafish embryos increases the impact and utility of the technology reported in this work. 

      Weaknesses: 

      - While the methods here represent an important step forward in advancing the technology, they still fall short of the dynamic range and selectivity likely required for robust activation by endogenous RNA.

      - While the iSBH-sgRNAs where the RNA trigger overlaps with the spacer appear to function robustly, the modular iSBH-sgRNAs seem to perform quite a bit less well. The authors state that modular iSBHsgRNAs show better activity without increasing background when the SAM system is added, but this is not supported by the data shown in Figure 3D, where in 3 out of 4 cases CRISPR activation in the absence of the RNA trigger is substantially increased.

      - There is very little discussion of how the performance of the technology reported in this work compares to previous iterations of RNA-triggered CRISPR systems, of which there are many examples.  

      Concerning the methods falling short of the dynamic range and selectivity required for robust activation by endogenous RNA, we acknowledge this limitation and recognise the need for improvement in this area. In the resubmitted version of the manuscript, we provided a detailed discussion on how the selection of appropriate triggers might partially improve dynamic ranges and selectivity. This includes an exploration of various strategies and considerations that may enhance the robustness of our system (print screen above, also used for addressing Reviewer #1 comments). 

      Regarding the inconsistent performance of the modular iSBH-sgRNAs, we acknowledge that modular iSBH-sgRNAs seem to perform slightly less well than first- and second-generation designs. In order to illustrate this, we modified corresponding bar graphs to include fold turn-on iSBH-sgRNA activation in addition to significance (Figures 1, 2 and 3 of the manuscript). We also acknowledge this fact in the text, as well as we recognise this discrepancy in the Figure 3.D and provide further clarifications. To help conveying this message even further, we introduced a new figure (Figure 3- figure supplement 2) to accompany the heat map shown in the Figure 3.D. with corresponding bar graphs. These changes are documented below:

      “…promoters. We ran 11 MODesign simulations for each trigger, incrementally extending the loop size while keeping the sgRNA 2 spacer input constant. HEK293T validation experiments showed that choosing modular iSBH-sgRNAs that detect the 4 U6-expressed triggers is possible (Figure 3.D, Figure 3- figure supplement 1.C). Despite not performing quite as well as second-generation designs (Figure 2.A.,Figure 3.D),modular iSBH-sgRNA still enable efficient RNA detection, especially for smaller RNAs such as triggers A and D. For highly efficient designs such asmodular iSBH-sgRNA (D), addition of the SAM effector system (Konermann et al. (2015)) boosted ON-state activation with only a negligible increase in the the OFF-state non-specific activation. Orthogonality tests suggested that activation of modular iSBH-sgRNA designs was specifically conditioned by complementary RNA triggers (Figure 3.E, Figure 3 - figure supplement 2), showing the exquisite specificity of the system.”

      Author response image 1.

      This supplementary figure reinterprets the data presented in Figure 3.E. using bar plots for enhanced clarity and comparison. It depicts the results of cotransfecting HEK293T cells with four modular iSBH-sgRNAs (A, B, C, and D) and examines all combinations of iSBH-sgRNA: RNA trigger pairings. The bar plots provide a visual representation of mean values with error bars indicating the standard deviation, based on three biological replicates.

      Regarding the concern about the lack of comparison with previous iterations of RNA-triggered CRISPR systems, we also acknowledged other similar technologies within the discussion. We also point readers to a literature review we recently published (doi/full/10.1089/crispr.2022.0052) where we describe other similar technologies in more detail.

      “To date, a variety of RNA-inducible gRNA designs have been developed (Hanewich-Hollatz et al. (2019); Hochrein et al. (2021); Jakimo et al. (2018); Jiao et al. (2021); Jin et al. (2019); Li et al. (2019); Liu et al. (2022); Lin et al. (2020); Siu and Chen (2019); Galizi et al. (2020); Hunt and Chen (2022b,a); Ying et al. (2020); Choi et al. (2023)). Nevertheless, there is a lack of direct, head-to-head comparisons of these designs under standardised experimental conditions. Some designs were evaluated in vitro, others in bacterial systems, and some in mammalian cells. Consequently, it is challenging to conclusively determine which design exhibits superior properties (Pelea et al. (2022)). Notably, to the best of our knowledge, the iSBH-sgRNA systemis the first RNA-inducible gRNA design tested in vivo and characterising the iSBH-sgRNA activation mechanism was essential for implementing iSBH-sgRNA technology in zebrafish embryos. In vivo, chemical modifications in the spacer sequence were vital for iSBH-sgRNA stability and function.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      We sincerely value the insightful and constructive feedback (italicized) provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. In response to these valuable comments, we have significantly revised the manuscript to enhance clarity and accuracy. Specifically, we have corrected an oversight related to the robot’s velocity and secondary antibody ratios, and addressed previously missing values in Figs. 3E and 4E. Importantly, these corrections did not alter the outcomes of our results. Additionally, we have enriched our manuscript with new data analyses, as reflected in Figures 1B, 1F, 2H-J, 4D, 4F-H, S1A, S1C-E, S3H, S5, and Table 1, ensuring a more comprehensive presentation of our findings. Below are our responses detailing each comment and explaining the modifications integrated into the revised manuscript.

      Reviewer 1:

      (1) To address the question of whether PAG photostimulation biases the cells that respond to the robot, a counterbalanced experiment, in which the BLA activity is initially recorded during the foraging vs. robot test and the PAG stimulation happens at the end of the session, should have been performed.

      In our study, we investigated fear behavior and BLA cell responses to intrinsic dPAG photostimulation (320 pulses) in naïve animals, followed by their reactions to an extrinsic predatory robot. We recognize the reviewer's concern regarding the potential  influence of initial dPAG photostimulation on BLA neuron responses to the robot. We address this issue in our discussion (pg. 13) as follows: “However, it is crucial to consider the recent discovery that optogenetic stimulation of CA3 neurons (3000 pulses) leads to gain-of-function changes in CA3-CA3 recurrent (monosynaptic) excitatory synapses (Oishi et al., 2019). Although there is no direct connection between dPAG neurons and the BLA (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), and no studies have yet demonstrated gain-of-function changes in polysynaptic pathways to our knowledge, the potential for our dPAG photostimulation (320 pulses) to induce similar changes in amygdalar neurons, thereby enhancing their sensitivity to predatory threats, cannot be dismissed.”

      (2) In Figure 3, it is unclear which criteria (e.g. response latency, minimum Z score, spike fidelity) was used to identify the BLA neurons that were indirectly activated by PAG stimulation. A graphic containing at least the distribution of the response latencies for each BLA neuron after PAG laser activation is needed.

      We have specified the criteria for determining the responsiveness of BLA neurons to dPAG stimulation on page 22. This involves analyzing the first 500-ms post-stimulation across five 0.1-s bins. Units were classified as ‘stim cells’ if they showed z-scores greater than 3 (z > 3) in any of the bins during the initial 500-ms period post-stimulation. Neurons activated by both pellet procurement and dPAG stimulation were not included in the 'stim cell' category. Additionally, we have included a graphic in the revised manuscript (Fig. S3C) that presents the distribution of response latencies of BLA neurons to dPAG stimulation.

      (3) To strengthen the claim that it is a BLA-PVT-PAG circuit that carries information about predatory threat, a new experiment using CTB and cFos could be used to demonstrate that PAG neurons that project to PVT are recruited during the robot exposure.

      Our study primarily aimed to explore the transmission of threat signals between the dPAG and BLA. We acknowledge that our evidence for the PVT’s intermediary role, derived from CTB injections in the BLA and subsequent CTB+cFos co-labeling analysis in the PVT (Fig. 4G and 4H), is limited. Accordingly, we have moderated the emphasis on the PVT’s involvement in both the abstract and introduction. We now present the PVT’s role as a promising direction for future research in the discussion section of our revised manuscript.

      (4) In Fig 2, the authors' interpretation is that photostimulation of PAG neurons elicits fleeing responses in the rats. However, there is a vast literature demonstrating that the PAG is also involved in nociception. Although this is recognized by the authors in the first part of the introduction and briefly described in the discussion, the authors should more explicitly explain that PAG stimulation produces analgesia and thus is unlikely to underlie the escaping responses observed. This may not be intuitive for a broader audience.

      We appreciate the reviewer's insightful suggestion to elaborate on the PAG involvement in nociception and analgesia, as supported by the literature. While our initial manuscript acknowledged these functions, we have now expanded our discussion to address the PAG’s multifaceted roles (pg. 12): “As mentioned in the introduction, the dPAG is recognized as part of the ascending nociceptive pathway to the BLA (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). The dPAG is also implicated in non-opioid analgesia (e.g., Bagley and Ingram 2020, Cannon et al. 1982, Fields 2000). However, it is essential to emphasize that, despite its roles in pain modulation, the primary behavior observed in dPAG-stimulated, naive rats foraging for food in an open arena was goal-directed escape to the safe nest, underscoring the dPAG’s critical function in survival behaviors.” Note that this aligns with human studies on PAG stimulation (e.g., Carrive and Morgan 2012, Magierek et al. 2003), particularly those by Amano et al. (Amano et al. 1982), which reported patients feeling an urge to escape, similar to being chased, upon PAG stimulation.

      (5) To truly demonstrate the functional links between the PAG and BLA, more experiments are needed. For example, one could record from BLA neurons during the robot surge while performing optogenetic inhibition of the PAG neurons. There is also no evidence that activity in the indirect pathway that connects the PAG to the BLA is indispensable for the expression of defensive responses towards the robot (e.g., causality tests using chemogenetic or optogenetic inactivation).

      We agree that incorporating optogenetic inhibition of PAG neurons while simultaneously recording from BLA neurons during a robot surge would strengthen the evidence for the functional connectivity between the PAG and BLA. Such an experiment would necessitate the transfection and photoinhibition of a wide array of dPAG neurons responsive to predatory threats. This procedure is technically more viable in transgenic mouse models, given their suitability for genetic manipulation. In light of this, and in response to the suggestions in the Joint Public Review, we have revised the abstract, introduction, and discussion to offer a more cautious interpretation of our findings. This revision reflects a careful consideration of both the evidence and the limitations inherent in our study (pg. 13): “While our findings demonstrate that opto-stimulation of the dPAG is sufficient to trigger both fleeing behavior and increased BLA activity, we have not established that the dPAG is necessary for the BLA’s response to predatory threats. To establish causality, it is essential to conduct experiments such as optogenetic inhibition to determine whether the dPAG is indispensable for activating BLA neurons and initiating escape behavior in the face of threats. The complexity of targeting the dPAG, which includes its dorsomedial, dorsolateral, lateral, and ventrolateral subdivisions (e.g., Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993), suggests the need for future studies using transgenic mouse models. Should inactivation of the dPAG negate the BLA's response to predatory threats, it would underscore the dPAG's central role in this defensive mechanism. Conversely, if BLA responses remain unaffected by dPAG inactivation, this could indicate the existence of multiple pathways for antipredatory defense mechanisms.”

      (6) The manuscript lacks information about the number of rats and trials that were used across the experiments (e.g. Fig 2G-J). In some occasions, the authors start the experiments with a specific number of animals and then reduce the N by half without providing a rationale (e.g. Fig. 3). Equally confusing is the experimental timeline. For example: a) Were the pre-robot, robot, and post-robot sessions always performed within the same day? b) It was described that microdrivable arrays were used, but did the same rats experienced the robot test more than one time? c) How many bins were used for normalization during the Z-score calculation and when were the data binned at 100 ms versus 1 s? d) How many trials were used for each analysis? For example, to identify robot cells, did the authors establish a minimum number of trials per animal to calculate the peristimulus time histograms? Having a significant number of trials is critical to make sure that the observed neuronal responses are replicable across the trials. e) How was the neuronal activity related to "pellet retrieval" aligned during robot sessions? Was the activity aligned with the moment in which the rat touches the pellet or when the animal returns to the nest with the pellet? f) How did the authors control for trials in which the rat consumed the pellets in the same local vs. those in which they returned to the nest to eat it? All these points are extremely important for future replicability.

      We apologize for any confusion caused by the initial lack of detail in our experimental procedures. The revised manuscript has been updated with comprehensive methodological details:  

      (i) The study involved thirteen rats (ChR2, n = 9; EYFP, n = 4), subjected to dPAG stimulation using fixed light parameters (473 nm, 20 Hz, 10-ms pulse width, 2 s duration) during Long and Short pellet distance trials (refer to Fig. 2E-G). The stimulation intensity was adjusted to each animal's response (fleeing behavior), ranging from 1-3 mW. Additional testing occurred over multiple days, with incremental adjustments to stimulation parameters (intensity, frequency, duration) after confirming normal baseline foraging behavior (Fig. 2H-J, at x = 0). These details are now clearly depicted in the manuscript.

      (ii) The primary objective was to investigate BLA neuron responses to dPAG opto-stimulation. Six rats were initially tested, with three later assessed for their reactions to dPAG stimulation in the presence of an actual predator, to gauge behavioral effects.

      (iii) Regarding the experimental timeline:

      a) Pre-robot, robot, and post-robot sessions were conducted successively on the same day.

      b) Sessions with the robot predator were repeated until habituation occurred or when unit recordings were deemed invalid due to microdrive limitations or the absence of unit detection. Throughout these sessions, the success rate for pellet retrieval remained consistently low. Specifically, the mean success rate for the dPAG recordings was 2.803% + 1.311. For the BLA recordings, animals did not succeed in retrieving pellets during any of the robot trials. To provide a more detailed account of the methodology, the manuscript has been updated to include the number of recording days and the units recorded in the "Behavioral Procedures" section.

      c) As described in Materials and Methods, unit recording data were binned at 0.1-s intervals and normalized against a 5-s pre-event baseline (50 bins). For statistical analyses in Figure 1F’s rightmost column, 1-s bins were used to simplify post-hoc analysis corrections.

      d) Each recording session consisted of 5-15 trials. Trials were excluded if rats attempted to procure the pellet within 10 s post-dPAG stimulation or robot activation, ensuring accurate characterization of unit responsiveness. Consequently, the number of trials varied among subjects.

      e) Pellet retrieval was indicated by the animal entering a designated zone 19 cm from the pellet, driven by hunger.

      f) Animals were trained to retrieve pellets and return to their nest for consumption prior to robot testing sessions, as elaborated in the “Baseline foraging” section.

      (7) In the abstract, the authors mention that predictive cues are ambiguous during naturalistic predatory threats, but it is not clear what do they mean by ambiguous. In addition, in the introduction section, the authors describe that the present study will investigate how the dPAG and BLA communicate threat signals. However, the author should clarify right in the beginning that these two regions are not monosynaptically connected with each other and cite the proper references.

      The abstract’s original sentence, “…where predictive cues are ambiguous and do not afford reiterative trial-and-error learning…” has been refined to “…characterized by less explicit cues and the absence of reiterative trial-and-error learning events …” This adjustment more accurately reflects that cues in natural settings often lack the clear and consistent quality of those in controlled experimental settings, which is necessary for the straightforward process of trial-and-error learning.

      Regarding the dPAG and BLA connectivity, the revised introduction (pg. 5) now states: “Considering the lack of direct monosynaptic projections between dPAG and BLA neurons (Vianna and Brandao 2003, McNally, Johansen, and Blair 2011, Cameron et al. 1995), we utilized anterograde and retrograde tracers in the dPAG and BLA, respectively. This was complemented by c-Fos expression analysis following exposure to predatory threats. Our anatomical findings suggest that the paraventricular nucleus of the thalamus (PVT) may be part of a network that conveys predatory threat information from the dPAG to the BLA.”

      (8) In the introduction section, the authors should clarify that the US information is conveyed from the PAG to BLA via the lateral thalamus (posterior intralaminar nucleus, medial geniculate nucleus) or dorsal midline thalamus (paraventricular nucleus of the thalamus). The statement regarding how "the PAG functions as part of the ascending pain transmission pathway, providing footshock US information to the BLA" is misleading because the PAG does not send monosynaptic projections directly to the BLA.

      The revised text (pg. 3) now reads: “…suggest that the dPAG is part of the ascending US pain transmission pathway to the BLA, the presumed site for CS-US association formation (De Oca et al. 1998, Gross and Canteras 2012, Herry and Johansen 2014, Kim, Rison, and Fanselow 1993, Ressler and Maren 2019, Walker and Davis 1997). This pathway is thought to be mediated through the lateral and dorsal-midline thalamus regions, including the posterior intralaminar nucleus and paraventricular nucleus of the thalamus (Krout and Loewy, 2000; McNally, Johansen, and Blair, 2011; Yeh, Ozawa, and Johansen, 2021; but see Brunzell and Kim, 2001).”

      (9) The author's assumption that threat information flows from the PAG to the BLA, rather than BLA to PAG, based on electrical stimulation and lesion experiments performed in previous studies is problematic for at least three reasons: a) Electrical stimulation can activate fibers of passage as well as presynaptic neurons antidromically. b) The lesion approach may not have targeted 100% of the neurons in PAG, which extends anatomically along the antero-posterior axis of the midbrain for several millimeters in rats. This observation also disagrees with more recent studies using optogenetics and imaging tools demonstrating that the PAG is the downstream target of the BLA-CeA pathway. c) The authors cited prior reports describing the role of the amygdala-PAG pathway in dampening the US response and providing a negative signal to the PAG. However, a series of previous studies demonstrating that the PAG serves as the downstream target of the central nucleus of the amygdala for the expression of defensive response are completely ignored by the authors. Here are just some examples: Massi et al, 2023, PMID: 36652513; Tovote et al 2016, PMID: 27279213; Penzo et al, 2014 PMID: 24523533).

      We recognize the complexities in interpreting findings from electrical stimulation and lesion studies. Our prior work (Kim et al. 2013) supports the conclusion that predatory threat information directionally flows from the dPAG to the BLA, as evidenced by distinct behavioral outcomes from experimental manipulations of dPAG and BLA. Specifically, dPAG stimulation-induced fleeing behavior was blocked by BLA lesions (as well as muscimol inactivation), whereas BLA stimulation-induced fleeing was unaffected by dPAG or combined dPAG+vPAG lesions (refer to Fig. 5A), suggesting a flow from dPAG to BLA. Our manuscript further clarifies that dPAG optostimulation results confirmed that escape behavior in foraging rats, induce by dPAG electrical stimulation (Kim et al. 2013), was activated by intrinsic dPAG neurons rather than by fibers of passage or current spread to other brain regions.  

      Furthermore, the PAG’s anatomical and functional diversity, with distinct segments along its longitudinal axis associated with different defensive behaviors, reinforces our conclusions. The dPAG is implicated in flight responses, while the vPAG is associated with freezing behavior (e.g., Bandler and Shipley 1994, Kim, Rison, and Fanselow 1993, Lefler, Campagner, and Branco 2020, Morgan, Whitney, and Gold 1998). The critiques' referenced studies primarily focus on the BLA-CeA-vPAG circuit's role in freezing during Pavlovian fear conditioning, contrasting with our emphasis on the dPAG-PVT-BLA circuit and its mediation in escape behavior in response to naturalistic predatory threats.

      We also note that different invasive procedures can yield varying behavioral outcomes. For example, both acute (e.g., optogenetic and muscimol inactivation) and chronic (e.g., surgical ablation) manipulations within the same brain circuit have shown diverse effects across species (Otchy et al. 2015). Moreover, optogenetics comes with its own set of conceptual and technical challenges (Adamantidis et al. 2015), including the difficulty of targeting, quantifying and photo-inhibiting 100% of PAG neurons. Despite the limitations of each technique, our collective evidence from lesions, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recordings (the present study) supports the premise that the dPAG acts upstream of the BLA in processing predatory threat information.

      (10) In the discussion, the authors suggest that the PVT may be the interface between the PAG and the BLA for the expression of antipredatory defensive behavior during their foraging vs. robot test, but previous studies looking at the role of PVT in antipredator defensive behavior and/or approach-avoidance conflict tasks are not cited and discussed in the manuscript (Engelke et al, 2021, PMID: 33947849; Choi et al 2019, PMID: 30979815; Choi and McNally 2017, PMID: 28193686).

      We thank the reviewer for pointing out these pivotal studies, which we have carefully reviewed and integrated into the revised manuscript (pg. 14): “These results, in conjunction with previous research on the roles of the dPAG, PVT, and BLA in producing flight behaviors in naïve rats (Choi and Kim 2010, Daviu et al. 2020, Deng, Xiao, and Wang 2016, Kim et al. 2013, Kim et al. 2018, Kong et al. 2021, Ma et al. 2021, Reis et al. 2021), the anterior PVT’s involvement in cat odor-induced avoidance behavior (Engelke et al. 2021), and the PVT’s regulation of behaviors motivated by both appetitive and aversive stimuli (Choi and McNally 2017, Choi et al. 2019), suggest the involvement of the dPAGàPVTàBLA pathways in antipredatory defensive mechanisms, particularly as rats leave the safety of the nest to forage in an open arena (Figure 4I) (Reis et al. 2023).”  

      (11) The authors use the expression "looming robot predator" in many cases throughout the manuscript. However, it is unclear whether the defensive responses observed in the rats are elicited by the looming stimulus produced by the movement of the robot towards the rats. The authors describe that rats do not respond to a stationary robot, but would the sound produced by the movement of the robot elicit defensive responses? Would non-approaching lateral or dorsoventral movements (not associated with looming) be sufficient to induce defensive behavior in the rats? There is a vast literature in the field about defensive behaviors induced by looming stimuli. The authors should empirically demonstrate that the escaping responses induced by the robot are mediated by looming or refrain to use the looming terminology to avoid confusion.

      Our use of "looming robot predator" is based on empirical evidence from a prior parametric study, which identified the forward, or 'looming,' motion of the Robogator as the key stimulus eliciting a flight response in rats (Kim, Choi, and Lee 2016). This reaction significantly decreased when the robot moved backward from the same starting position, producing a similar sound, and was absent when the robot remained stationary. This suggests that neither sound alone nor the mere presence of a novel object provokes goal-directed escape behavior (Kong et al. 2021). This aligns with studies indicating that simulated looming stimuli, like an expanding disk, induce flight or freezing responses in mice (De Franceschi et al. 2016, Yilmaz and Meister 2013).

      It should be noted that the 2013 study by Yilmaz & Meister (Yilmaz and Meister 2013) on the looming disk paradigm showed that not all mice responded to the stimuli (e.g., Figs. 2A and 3A), with those that did exhibiting rapid habituation by the second exposure. This contrasts with our predatory robot paradigm (Choi and Kim 2010), where all rats consistently fled from the looming robotic predator across multiple trials, underscoring the critical role of looming motion in simulating predator attacks that trigger flight behavior in rats.

      Thus, the term "looming" accurately captures the nature of the robot's movement and its effect on eliciting defensive responses in rats. Nonetheless, should the editors agree with the reviewer's suggestion to minimize potential confusion, we are willing to substitute "looming" with "approaching," although we consider the terms to be synonymous in the context of our study.

      (12) If the authors are citing the Rescorla-Wagner model, they should include at least one additional sentence to explain it, as many people in the field are not familiar with this model.

      In response to the request for clarification on the Rescorla-Wagner model, we have added an explanatory sentence (pg. 4): “Fundamentally, the negative feedback circuit between the amygdala and the dPAG serves as a biological implementation of the Rescorla–Wagner (1972) model, a foundational theory of associative learning that emphasizes the importance of prediction errors in reinforcement (i.e., US), as applied to FC (Fanselow 1998).”

      (13) The authors need to include the normality test used to determine whether a parametric or non-parametric statistical analysis was the most appropriate test for each experiment.

      We have included the outcomes of the normality tests, detailed in Table S1.

      (14) In Fig. 1F, the authors show a representative PAG neuron with peristimulus-time histogram and rasters reaching frequencies higher than 100 Hz and sustained firing rates of >50 Hz following robot activation. The authors should include a firing rate analysis (e.g., average firing rate and maximum firing rate before and after robot activation) of the 22 robot-responsive PAG neurons recorded during the session to clarify whether this high firing rate, which is atypical in other brain regions, is commonly observed in the PAG. Showing the isolated waveforms of some representative neurons would help to clarify whether the activity is being recorded from a single-isolated unit instead of multiple units within the same channel.

      In response to the critique, we have expanded our analysis to include both average and maximum firing rates before and after robot activation for the 22 robot-responsive PAG neurons. This detailed firing rate analysis, illustrating their distribution, has been incorporated into the revised manuscript (refer to Figure S1C and S1D). Furthermore, to alleviate concerns regarding the identification of single-unit activity versus potential multi-unit recordings, we have included peri-event raster plots and waveforms for two additional representative neurons in Figure 1F.

      (15) In Figure 2, the authors should indicate when the recordings are performed on anesthetized vs. freely-moving awake animals.

      In the original manuscript, we specified that the optrode recordings depicted in Figure 2B were conducted on anesthetized rats. To enhance clarity and directly address the critique, we have now clearly indicated this condition in Figure 2A as well.

      (16) The optogenetic stimulation parameters used in Fig 2H indicate that 0.5 mW was sufficient to induce behavioral changes. This is surprising because most optogenetic experiments in the field use much higher intensities (> 5mW). If much lower intensities are sufficient to drive PAG-mediated behaviors, this may be a very important observation that should be conveyed to the field. I recommend the reviewers clarify if they in fact used 0.5 mW and then discuss that the laser intensity used in the experiments was 10X lower than that required for other brain regions

      In our study, we indeed observed that 0.5 mW of dPAG stimulation increased the latency to procure the pellet without completely preventing the action. Notably, at 1 mW, more than half of the animals (n = 5/9 rats; Fig. 2H) and at 3 mW, all rats (9/9) failed to procure the pellet and fled from the foraging area to the nest (Fig. 2G). These results indicate that even lower intensities were sufficient to elicit behavioral changes through dPAG stimulation in a large foraging arena, highlighting the dPAG's sensitivity to optogenetic manipulation. This finding is consistent with our earlier research on dPAG electrical stimulation, which required significantly lower intensities to provoke defensive behaviors compared to the BLA. Specifically, the stimulation intensity needed for aversive behavior in the dPAG was substantially lower (dPAG: 65.0 ± 6.85 µA) than for the BLA (BLA: 275.0 ± 24.44 µA) (Kim et al. 2013). Furthermore, Deng et al. (Deng, Xiao, and Wang 2016) showed that 1 mW of blue light could elicit a 60% freezing response, with 2 mW triggering flight behavior within a latency of 0.6 seconds.

      (17) In Fig 2 G-J, how many animals are being used per group and how was the sequence of the experiments performed? This is very important for replicability.

      A total of three rats were utilized for the robot testing experiments depicted in Fig. 2 G-J. The experimental sequence for these animals consisted of successive pre-stimulation, stimulation, post-stimulation, and robot sessions. We have updated the manuscript to include this information.

      (18) For the photostimulation of PAG neurons in Figs. 2 and 3, the authors need to clarify if the same parameters of laser stimulation used during the anesthetized recordings were also used during the behavioral tests. Also, the wavelength corresponding to the blue laser should be 473 nm instead of 437 nm.

      We thank the reviewer for identifying the error. We confirm that the opto-stimulation parameters (473 nm, 10-ms pulse width, 2 s duration) were consistently applied across both anesthetized recordings and behavioral tests. This consistency has been explicitly stated in the revised manuscript to ensure clarity regarding our experimental approach.

      (19) In Fig. 3I, how was the representative trials selected? Instead of picking up the most representative trials, the authors should demonstrate the response of the cell during the entire session.

      In response to the critique, we clarify that the color-coded PETH shown in Fig. 3I represents averaged BLA activity across a comprehensive set of trials. This includes 8 pre-stimulation, 10 stimulation, and 8 post-stimulation trials for the robot-activated sessions, with a similar distribution for non-stimulated sessions. This approach was chosen to provide a representative overview of the cell's response throughout the entire session. To address the request for more detailed data, we have added traditional PETHs to the revised manuscript (see Fig. S3H), which depict the cell's response across all trials.

      (20) Fig 4 D should demonstrate a colabeling between the anterograde PAG fibers in the PVT and the retrogradely labeled neurons from BLA instead of PAG fibers only.

      We wish to clarify that Fig. 4D is intended to show the distribution of dPAG terminals within the midline thalamic nuclei, as noted in prior research (Krout and Loewy 2000). Although dPAG terminals are distributed throughout the midline thalamus, our observations have specifically highlighted a notable increase in c-Fos expression within the paraventricular nucleus of the thalamus (PVT) in rats subjected to the robotic predator stimulus, in contrast to those in the foraging-only control condition (Fig. 4E). Addressing the reviewer's point, we direct attention to Fig. 4G, which includes images labeled "Robot-experienced" and "Merge." This figure demonstrates a subset of PVT neurons that were retrogradely labeled with CTB injected into the BLA, anterogradely labeled with AAV injected into the dPAG, and activated (as indicated by c-Fos expression) in response to the robotic predator. This provides specific colabeling evidence between anterograde PAG fibers in the PVT and retrogradely labeled neurons from the BLA, directly addressing the critique.

      (21) The resolution of the cFos images is very low and makes it hard to appreciate.

      We have updated Figs. 4F and 4G with high-resolution versions to ensure the details are more clearly visible. Furthermore, should there be a need for even greater clarity, we are prepared to supply the images as TIFF files, which are known for preserving high image quality.

      Reviewer 2:

      (1) The text is clearly written, and I appreciated the inclusion of interesting citations, such as the one about paintings by cavemen. The authors also do a good job of discussing the underlying theoretical framework and the figures are easy to understand. Although the topic is very interesting, the amount of novel work is somewhat low. Figure 1 shows that dPAG cells are activated by the predator, and this has been shown by many prior reports. Similarly, Figure 2 shows that dPAG activation creates defensive responses, and this too has been shown by many prior reports.

      We appreciate the reviewer’s positive remarks. We acknowledge the rich body of research documenting dPAG neuronal activation by various predator cues such as odors (e.g., fox urine) (Lu et al. 2023), and scenarios involving anesthetized or spontaneously moving rat/cat predators, either physically partitioned or harness-restrained (Bindi et al. 2022, Deng, Xiao, and Wang 2016, Esteban Masferrer et al. 2020). Nevertheless, our study distinguishes itself by examining dPAG neuronal responses to a robotic predator, uniquely designed to replicate consistent looming motions across multiple trials and subjects within an environment that simulates natural foraging conditions, inclusive of a safe nest (cf. Choi and Kim, 2010). This approach allowed us to not only reveal the immediate activation of dPAG neurons in response to a rapidly approaching predator but also to explore the consequent fleeing behavior towards safety, thereby providing new insights into the dPAG's role in mediating goal-directed defensive responses in a more ecologically-relevant setting. Furthermore, our investigation extends beyond these findings to assess the impact of dPAG activation on BLA neuronal responses and their functional connectivity during predator-prey interactions, offering a fresh perspective on the neural circuits that support survival behaviors in animals when confronted with naturalistic threats.

      (2) The results in Figure 3 are novel and interesting, but the characterization of BLA activity is incomplete. For example, what are the percentages of BLA cells that are inhibited or activated by all major behaviors observed? These behaviors include approach to pellet, escape from robot, freezing, stretch-attend postures, etc. These same analyses should also be added to dPAG activity in Figure 1. How does BLA single cell encoding of these behaviors relate to their responsivity to dPAG stimulation? And, finally, it is unclear what is the significance of BLA correlated synchronous firing. Is the animal more or less likely to be performing certain behaviors when correlated BLA firing occurs?

      Our analysis, as presented in Figs. 3I, 3K, and S3D-F, selectively focused on BLA cell responses during distinct behaviors such as approaching a pellet and escaping from the robot. These behaviors were selected because their precise temporal markers allow for accurate correlation with BLA cell activity, building on the findings of our previous research (Kim et al. 2018, Kong et al. 2021).

      The robot's motion, programmed to advance a fixed distance before retreating to its starting position, is designed to repeatedly elicit foraging, thus facilitating analysis of neural changes during conflict situations involving food approach and predator avoidance. However, this also leads to the rapid diminution of freezing and stretch-attend postures inside the nest as animals quickly adapt to the robot's movement pattern, rendering a time-stamped analysis of these behaviors unfeasible under our experimental conditions. While the inclusion of these behaviors in our analysis would be insightful, especially in extended interaction scenarios where the robot advances to the nest opening and remains before returning in a less predictable manner, such conditions would likely reduce foraging behavior due to increased fear, deviating from our study's primary objective of elucidating the interactions between the dorsal periaqueductal gray (dPAG) and the basolateral amygdala (BLA) functions.

      Regarding the significance of BLA correlated synchronous firing, our findings, particularly in Figures 3M-O and S4, demonstrate significant synchronous activity among BLA neuronal pairs during encounters with the robot, as opposed to pre-stim, stim, and post-stim sessions. This synchrony is notably prominent among neurons responsive to dPAG stimulation, indicating that BLA neurons involved in processing dPAG signals may play a crucial role in enhancing BLA network coherence to effectively manage predatory threat information (pg. 13).

      (3) In Figure 4, the authors identify the PVT as a potential region that can mediate dPAG to BLA communication via anatomical tracing. However, functional assays are missing. For example, if the PVT is inhibited chemogenetically, does this result in a smaller number of BLA cells that are activated by dPAG stimulation? Does activation of the dPAG-PVT or the PVT-BLA projections cause defensive behaviors? Functionally showing that the dPAG-PVT-BLA circuit controls defensive actions would be a major advance in the field and would greatly enhance the significance of this paper. It would also provide an anatomical substrate to support the view that the BLA is downstream of the dPAG, which was first demonstrated by the authors in their elegant 2013 PNAS paper.

      We appreciate the reviewer’s constructive critique and valuable suggestions on the necessity for functional validation of the dPAG-PVT-BLA circuit's involvement in mediating defensive behaviors. In light of these comments, we have carefully considered and included a discussion on the importance of these proposed experiments as a direction for future research in our manuscript revision (also see response to Reviewer 1’s critique #5).

      Our initial work in 2013 (Kim et al. 2013) laid the groundwork for identifying BLA neurons responsive to dPAG stimulation and suggested the PVT as a potential relay in this neural circuit. Recognizing the limitations of our current study, which does not include direct functional assays, we have adjusted our manuscript to convey the speculative aspect of the dPAG-PVT-BLA circuit’s role more accurately. Moreover, we have enriched our discussion by citing relevant studies that lend support to our proposed circuit mechanism. These references serve to place our findings within the broader context of existing research and highlight the imperative for subsequent studies to empirically confirm the functional significance of the dPAG-PVT-BLA pathway in driving defensive behaviors.

      Reviewer 3:

      (1) The Introduction refers to a negative feedback amygdala-dPAG from a study of the Johansen group, but in this case, the authors were referring to the ventrolateral and not the dorsal PAG.

      We thank the reviewer for pointing out the need to distinguish between the dPAG and vPAG regions in our introduction. While Johansen et al. (2010) investigated the roles of PAG (including both dPAG and vPAG regions; see their Supplementary Figs. 4, 5, and 10), the differentiation between their specific contributions to the amygdala's negative feedback mechanism was not explicitly detailed in their initial publication. This distinction was further elaborated upon in later work by the same group (Yeh, Ozawa, and Johansen 2021), which specifically illuminated the dPAG's role in conditioned fear memory formation and its neural pathways to the PVT that influence fear learning. To reflect this nuanced understanding, we have revised our introduction (pg. 3): “In parallel, Johansen et al. (2010) found that pharmacological inhibition of the PAG, encompassing both dPAG and vPAG regions, diminishes the behavioral and neural responses in the amygdala elicited by periorbital shock US, thereby impairing the acquisition of auditory FC.”

      (2) In the experiments recording dPAG in response to the predator threat, the authors mentioned cells activated by the predator threat, referred to as "robot cells." Were these cells inhibited in response to threat?

      In the Result and Materials and Methods sections, we report that 23.4% (22 out of 94) of dPAG neurons, termed “robot cells,” showed a significant increase in firing rates (z > 3) within a latency of less than 500 ms during exposure to the looming robot threat, but not during the pre- and post-robot sessions. These cells are highlighted in Figures 1E-G. In contrast, we identified only a single unit exhibiting a decrease in activity (z-score < -3) in response to the robot threat. Given the overwhelming prevalence of cells with excitatory responses to the threat, our discussions and analyses have primarily centered on these excited cells. Nevertheless, to ensure a full depiction of our observations, we have included data on the inhibited unit in the revised manuscript, specifically in Figure S1E.

      (3) The authors claim that tetrodes were implanted in the dorsal PAG; however, the electrodes' tips shown in the figures are positioned more ventrally in the lateral PAG (see Figures 1B, S5A).

      The PAG is anatomically organized into dorsomedial (dmPAG), dorsolateral (dlPAG), lateral (lPAG), and ventrolateral (vlPAG) columns along the rostro-caudal axis of the aqueduct. The designation "dorsal PAG" (dPAG) traditionally encompasses the dmPAG, dlPAG, and lPAG regions, a classification supported by extensive track-tracing, neurochemical, and immunohistochemical evidence (e.g., (Bandler, Carrive, and Zhang 1991, Bandler and Keay 1996, Carrive 1993)). As Bandler and Shipley (Bandler and Shipley 1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (Schenberg et al. 2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switch-off behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, electrode placements were strictly within these specified dPAG regions. The electrode tip locations depicted in Figures 1B and S5A correspond with the -6.04 mm template (left panel below) from Paxinos & Watson’s atlas (Paxinos and Watson 1998), situated anteriorly to the emergence of the  vlPAG (right panel below). To enhance clarification in our manuscript, we provide a detailed definition of the dPAG that includes the dmPAG, dlPAG,  and lPAG, and support our electrode placement rationale with references to established literature (pg. 5).

      Author response image 1.

      (4) It would be nice to include a series of observations applying inhibitory tools (i.e., optogenetic photo inhibition) in the dPAG and BLA and see how they affect the behavioral responses in the 'approach food-avoid predator' paradigm. Moreover, it would be interesting to explore how inhibiting the dPAG to PVT pathway influences the flee response during the robot surge.

      We appreciate the suggestion to explore the effects of optogenetic inhibition in the dPAG and BLA on behavioral responses within the 'approach food-avoid predator' paradigm, as well as the potential impact of inhibiting the dPAG to PVT pathway on flee responses during robot surge incidents. As mentioned in our response to Reviewer 1’s critique #5, the application of optogenetic inhibition necessitates transfecting, quantifying, and photoinhibiting a comprehensive set of dPAG neurons activated by predatory threats. This approach is more viable in future studies that can leverage transgenic mouse models for their genetic tractability. Following the Joint Public Review’s recommendations, we have revised our manuscript to ensure a more measured interpretation of our data, carefully balancing the evidence from tracer studies against the limitations of our current methodology.

      Furthermore, referencing Reviewer 1’s critique #9, it is important to consider that various invasive techniques can yield different behavioral outcomes. For instance, research by Olveczky and colleagues (Otchy et al. 2015) demonstrated that acute manipulations (i.e., optogenetic and muscimol inactivation) and chronic surgical ablation of the same brain circuit can produce distinct effects in rats and finches. Despite these methodological constraints, our collective results from lesion, inactivation, electrical stimulation (Kim et al. 2013), optostimulation, and single-unit recording (present) studies cohesively suggest that the dPAG functions upstream of the BLA in processing predatory threat signals.

      (5) The authors should also examine whether 'synaptic' appositions exist between the anterogradely labeled terminals from the dPAG and the double labeled CTB and cFOS neurons in the PVT.

      We appreciate the suggestion to investigate the presence of synaptic appositions, which could potentially offer valuable insights into the synaptic connections and functional interactions within this neural circuit. However, due to the specialized nature of electron microscopy required for these examinations and the extensive resources it entails, this line of inquiry falls beyond the scope of our current study. We hope to address this aspect in future studies, where we can dedicate the necessary resources and expertise to conducting these intricate analyses.

      (6) It is odd to see the projection fields shown in Fig. 4D, where the projection to the PVT looks much sparser compared to other targets in the thalamus and hypothalamus. If the projection to the PVT has such an important function, why does it seem so weak? This should be discussed. Also, because the projection to the PVT seems sparse, the authors should consider alternative paths like the one involving the cuneiform nucleus. The cuneiform nucleus is an important region responding to looming shadows with strong bidirectional links to the dorsolateral periaqueductal gray, providing strong projections to the rostral PVT.

      The perceived scarcity of the dPAG-PVT pathway might not reflect its functional significance accurately. The PVT's small size could make its projections appear less dense in broad anatomical studies. To address this, we have updated Figure 4D with a high-resolution image that offers a detailed view of the PVT region. This enhancement (refer to the updated Fig. 4, bottom) more accurately depicts the projection density within the PVT. It is also critical to consider that the functional impact of neural pathways is not solely dependent on the quantity of projecting neurons. For instance, work by Deisseroth and colleagues (Rajasethupathy et al. 2015) has shown that even relatively sparse monosynaptic projections from the anterior cingulate cortex to the hippocampus can exert significant effects on neural circuit dynamics. Additionally, we have expanded our discussion to consider the potential roles of other circuits, such as the cuneiform nucleus, in driving the behavioral responses observed in our study (pg. 15): “Given the recent significance attributed to the superior colliculus in detecting innate visual threats (Lischinsky and Lin 2019, Wei et al. 2015, Zhou et al. 2019) and the cuneiform nucleus in the directed flight behavior of mice (Bindi et al. 2023, Tsang et al. 2023), further exploration into the communication between these structures and the dPAG-BLA circuitry is warranted.”

      (7) Finally, in the Discussion, it would be nice to comment on how the BLA mediates flee responses. Which pathways are likely involved?

      This excellent suggestion has been incorporated in the discussion (pg. 15): “Future studies will also need to delineate the downstream pathways emanating from the BLA that orchestrate goal-directed flight responses to external predatory threats as well as internal stimulations from the dPAG/BLA circuit. Potential key structures include the dorsal/posterior striatum, which has been associated with avoidance behaviors in response to airpuff in head-fixed mice (Menegas et al. 2018) and flight reactions triggered by auditory looming cues (Li et al. 2021). Additionally, the ventromedial hypothalamus (VMH) has been implicated in flight behaviors in mice, evidenced by responses to the presence of a rat predator (Silva et al. 2013) and upon optogenetic activation of VMH Steroidogenic factor 1 (Kunwar et al. 2015) or the VMH-anterior hypothalamic nucleus pathway (Wang, Chen, and Lin 2015). Investigating the indispensable role of these structures in flight behavior could involve lesion or inactivation studies. Such interventions are anticipated to inhibit flight behaviors elicited by amygdala stimulation and predatory threats, confirming their critical involvement. Conversely, activating these structures in subjects with an inactivated or lesioned amygdala, which would typically inhibit fear responses to external threats (Choi and Kim 2010), is expected to induce fleeing behavior, further elucidating their functional significance.”

      Adamantidis, A., S. Arber, J. S. Bains, E. Bamberg, A. Bonci, G. Buzsaki, J. A. Cardin, R. M. Costa, Y. Dan, Y. Goda, A. M. Graybiel, M. Hausser, P. Hegemann, J. R. Huguenard, T. R. Insel, P. H. Janak, D. Johnston, S. A. Josselyn, C. Koch, A. C. Kreitzer, C. Luscher, R. C. Malenka, G. Miesenbock, G. Nagel, B. Roska, M. J. Schnitzer, K. V. Shenoy, I. Soltesz, S. M. Sternson, R. W. Tsien, R. Y. Tsien, G. G. Turrigiano, K. M. Tye, and R. I. Wilson. 2015. "Optogenetics: 10 years after ChR2 in neurons--views from the community."  Nat Neurosci 18 (9):1202-12. doi: 10.1038/nn.4106.

      Amano, K., T. Tanikawa, H. Kawamura, H. Iseki, M. Notani, H. Kawabatake, T. Shiwaku, T. Suda, H. Demura, and K. Kitamura. 1982. "Endorphins and pain relief. Further observations on electrical stimulation of the lateral part of the periaqueductal gray matter during rostral mesencephalic reticulotomy for pain relief."  Appl Neurophysiol 45 (1-2):123-35.

      Bagley, E. E., and S. L. Ingram. 2020. "Endogenous opioid peptides in the descending pain modulatory circuit."  Neuropharmacology 173:108131. doi: 10.1016/j.neuropharm.2020.108131.

      Bandler, R., P. Carrive, and S. P. Zhang. 1991. "Integration of somatic and autonomic reactions within the midbrain periaqueductal grey: viscerotopic, somatotopic and functional organization."  Prog Brain Res 87:269-305. doi: 10.1016/s0079-6123(08)63056-3.

      Bandler, R., and K. A. Keay. 1996. "Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression."  Prog Brain Res 107:285-300. doi: 10.1016/s0079-6123(08)61871-3.

      Bandler, R., and M. T. Shipley. 1994. "Columnar organization in the midbrain periaqueductal gray: modules for emotional expression?"  Trends Neurosci 17 (9):379-89. doi: 10.1016/0166-2236(94)90047-7.

      Bindi, R. P., C. C. Guimaraes, A. R. de Oliveira, F. F. Melleu, M. A. X. de Lima, M. V. C. Baldo, S. C. Motta, and N. S. Canteras. 2023. "Anatomical and functional study of the cuneiform nucleus: A critical site to organize innate defensive behaviors."  Ann N Y Acad Sci 1521 (1):79-95. doi: 10.1111/nyas.14954.

      Bindi, R. P., R. G. O. Maia, F. Pibiri, M. V. C. Baldo, S. L. Poulter, C. Lever, and N. S. Canteras. 2022. "Neural correlates of distinct levels of predatory threat in dorsal periaqueductal grey neurons."  Eur J Neurosci 55 (6):1504-1518. doi: 10.1111/ejn.15633.

      Cameron, A. A., I. A. Khan, K. N. Westlund, and W. D. Willis. 1995. "The efferent projections of the periaqueductal gray in the rat: a Phaseolus vulgaris-leucoagglutinin study. II. Descending projections."  J Comp Neurol 351 (4):585-601. doi: 10.1002/cne.903510408.

      Cannon, J. T., G. J. Prieto, A. Lee, and J. C. Liebeskind. 1982. "Evidence for opioid and non-opioid forms of stimulation-produced analgesia in the rat."  Brain Res 243 (2):315-21. doi: 10.1016/0006-8993(82)90255-4.

      Carrive, P, and M. M. Morgan. 2012. "Periaqueductal Gray." In The Human Nervous System, edited by J. K.; Paxinos Mai, G., 367-400. London: Academic Press.

      Carrive, P. 1993. "The periaqueductal gray and defensive behavior: functional representation and neuronal organization."  Behav Brain Res 58 (1-2):27-47. doi: 10.1016/0166-4328(93)90088-8.

      Choi, E. A., P. Jean-Richard-Dit-Bressel, C. W. G. Clifford, and G. P. McNally. 2019. "Paraventricular Thalamus Controls Behavior during Motivational Conflict."  J Neurosci 39 (25):4945-4958. doi: 10.1523/JNEUROSCI.2480-18.2019.

      Choi, E. A., and G. P. McNally. 2017. "Paraventricular Thalamus Balances Danger and Reward."  J Neurosci 37 (11):3018-3029. doi: 10.1523/JNEUROSCI.3320-16.2017.

      Choi, J. S., and J. J. Kim. 2010. "Amygdala regulates risk of predation in rats foraging in a dynamic fear environment."  Proc Natl Acad Sci U S A 107 (50):21773-7. doi: 10.1073/pnas.1010079108.

      De Franceschi, G., T. Vivattanasarn, A. B. Saleem, and S. G. Solomon. 2016. "Vision Guides Selection of Freeze or Flight Defense Strategies in Mice."  Curr Biol 26 (16):2150-4. doi: 10.1016/j.cub.2016.06.006.

      De Oca, B. M., J. P. DeCola, S. Maren, and M. S. Fanselow. 1998. "Distinct regions of the periaqueductal gray are involved in the acquisition and expression of defensive responses."  J Neurosci 18 (9):3426-32. doi: 10.1523/JNEUROSCI.18-09-03426.1998.

      Deng, H., X. Xiao, and Z. Wang. 2016. "Periaqueductal Gray Neuronal Activities Underlie Different Aspects of Defensive Behaviors."  J Neurosci 36 (29):7580-8. doi: 10.1523/JNEUROSCI.4425-15.2016.

      Engelke, D. S., X. O. Zhang, J. J. O'Malley, J. A. Fernandez-Leon, S. Li, G. J. Kirouac, M. Beierlein, and F. H. Do-Monte. 2021. "A hypothalamic-thalamostriatal circuit that controls approach-avoidance conflict in rats."  Nat Commun 12 (1):2517. doi: 10.1038/s41467-021-22730-y.

      Esteban Masferrer, M., B. A. Silva, K. Nomoto, S. Q. Lima, and C. T. Gross. 2020. "Differential Encoding of Predator Fear in the Ventromedial Hypothalamus and Periaqueductal Grey."  J Neurosci 40 (48):9283-9292. doi: 10.1523/JNEUROSCI.0761-18.2020.

      Fanselow, M. S. 1998. "Pavlovian conditioning, negative feedback, and blocking: mechanisms that regulate association formation."  Neuron 20 (4):625-7. doi: 10.1016/s0896-6273(00)81002-8.

      Fields, H. L. 2000. "Pain modulation: expectation, opioid analgesia and virtual pain."  Prog Brain Res 122:245-53. doi: 10.1016/s0079-6123(08)62143-3.

      Gross, C. T., and N. S. Canteras. 2012. "The many paths to fear."  Nat Rev Neurosci 13 (9):651-8. doi: 10.1038/nrn3301.

      Herry, C., and J. P. Johansen. 2014. "Encoding of fear learning and memory in distributed neuronal circuits."  Nat Neurosci 17 (12):1644-54. doi: 10.1038/nn.3869.

      Kim, E. J., O. Horovitz, B. A. Pellman, L. M. Tan, Q. Li, G. Richter-Levin, and J. J. Kim. 2013. "Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats."  Proc Natl Acad Sci U S A 110 (36):14795-800. doi: 10.1073/pnas.1310845110.

      Kim, E. J., M. S. Kong, S. G. Park, S. J. Y. Mizumori, J. Cho, and J. J. Kim. 2018. "Dynamic coding of predatory information between the prelimbic cortex and lateral amygdala in foraging rats."  Sci Adv 4 (4):eaar7328. doi: 10.1126/sciadv.aar7328.

      Kim, J. J., J. S. Choi, and H. J. Lee. 2016. "Foraging in the face of fear: Novel strategies for evaluating amygdala functions in rats." In Living without an amygdala, edited by D. G. Amaral and R. Adolphs, 129-148. The Guilford Press.

      Kim, J. J., R. A. Rison, and M. S. Fanselow. 1993. "Effects of amygdala, hippocampus, and periaqueductal gray lesions on short- and long-term contextual fear."  Behav Neurosci 107 (6):1093-8. doi: 10.1037//0735-7044.107.6.1093.

      Kong, M. S., E. J. Kim, S. Park, L. S. Zweifel, Y. Huh, J. Cho, and J. J. Kim. 2021. "'Fearful-place' coding in the amygdala-hippocampal network."  Elife 10. doi: 10.7554/eLife.72040.

      Krout, K. E., and A. D. Loewy. 2000. "Periaqueductal gray matter projections to midline and intralaminar thalamic nuclei of the rat."  J Comp Neurol 424 (1):111-41. doi: 10.1002/1096-9861(20000814)424:1<111::aid-cne9>3.0.co;2-3.

      Kunwar, P. S., M. Zelikowsky, R. Remedios, H. Cai, M. Yilmaz, M. Meister, and D. J. Anderson. 2015. "Ventromedial hypothalamic neurons control a defensive emotion state."  Elife 4. doi: 10.7554/eLife.06633.

      Lefler, Y., D. Campagner, and T. Branco. 2020. "The role of the periaqueductal gray in escape behavior."  Curr Opin Neurobiol 60:115-121. doi: 10.1016/j.conb.2019.11.014.

      Li, Z., J. X. Wei, G. W. Zhang, J. J. Huang, B. Zingg, X. Wang, H. W. Tao, and L. I. Zhang. 2021. "Corticostriatal control of defense behavior in mice induced by auditory looming cues."  Nat Commun 12 (1):1040. doi: 10.1038/s41467-021-21248-7.

      Lischinsky, J. E., and D. Lin. 2019. "Looming Danger: Unraveling the Circuitry for Predator Threats."  Trends Neurosci 42 (12):841-842. doi: 10.1016/j.tins.2019.10.004.

      Lu, B., P. Fan, M. Li, Y. Wang, W. Liang, G. Yang, F. Mo, Z. Xu, J. Shan, Y. Song, J. Liu, Y. Wu, and X. Cai. 2023. "Detection of neuronal defensive discharge information transmission and characteristics in periaqueductal gray double-subregions using PtNP/PEDOT:PSS modified microelectrode arrays."  Microsyst Nanoeng 9:70. doi: 10.1038/s41378-023-00546-8.

      Magierek, V., P. L. Ramos, N. G. da Silveira-Filho, R. L. Nogueira, and J. Landeira-Fernandez. 2003. "Context fear conditioning inhibits panic-like behavior elicited by electrical stimulation of dorsal periaqueductal gray."  Neuroreport 14 (12):1641-4. doi: 10.1097/00001756-200308260-00020.

      McNally, G. P., J. P. Johansen, and H. T. Blair. 2011. "Placing prediction into the fear circuit."  Trends Neurosci 34 (6):283-92. doi: 10.1016/j.tins.2011.03.005.

      Menegas, W., K. Akiti, R. Amo, N. Uchida, and M. Watabe-Uchida. 2018. "Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli."  Nat Neurosci 21 (10):1421-1430. doi: 10.1038/s41593-018-0222-1.

      Morgan, M. M., P. K. Whitney, and M. S. Gold. 1998. "Immobility and flight associated with antinociception produced by activation of the ventral and lateral/dorsal regions of the rat periaqueductal gray."  Brain Res 804 (1):159-66. doi: 10.1016/s0006-8993(98)00669-6.

      Otchy, T. M., S. B. Wolff, J. Y. Rhee, C. Pehlevan, R. Kawai, A. Kempf, S. M. Gobes, and B. P. Olveczky. 2015. "Acute off-target effects of neural circuit manipulations."  Nature 528 (7582):358-63. doi: 10.1038/nature16442.

      Paxinos, G., and C. Watson. 1998. The Rat Brain in Stereotaxic Coordinates. San Diego: Academic Press.

      Rajasethupathy, P., S. Sankaran, J. H. Marshel, C. K. Kim, E. Ferenczi, S. Y. Lee, A. Berndt, C. Ramakrishnan, A. Jaffe, M. Lo, C. Liston, and K. Deisseroth. 2015. "Projections from neocortex mediate top-down control of memory retrieval."  Nature 526 (7575):653-9. doi: 10.1038/nature15389.

      Ressler, R. L., and S. Maren. 2019. "Synaptic encoding of fear memories in the amygdala."  Curr Opin Neurobiol 54:54-59. doi: 10.1016/j.conb.2018.08.012.

      Schenberg, L. C., R. M. Povoa, A. L. Costa, A. V. Caldellas, S. Tufik, and A. S. Bittencourt. 2005. "Functional specializations within the tectum defense systems of the rat."  Neurosci Biobehav Rev 29 (8):1279-98. doi: 10.1016/j.neubiorev.2005.05.006.

      Silva, B. A., C. Mattucci, P. Krzywkowski, E. Murana, A. Illarionova, V. Grinevich, N. S. Canteras, D. Ragozzino, and C. T. Gross. 2013. "Independent hypothalamic circuits for social and predator fear."  Nat Neurosci 16 (12):1731-3. doi: 10.1038/nn.3573.

      Tsang, E., C. Orlandini, R. Sureka, A. H. Crevenna, E. Perlas, I. Prankerd, M. E. Masferrer, and C. T. Gross. 2023. "Induction of flight via midbrain projections to the cuneiform nucleus."  PLoS One 18 (2):e0281464. doi: 10.1371/journal.pone.0281464.

      Vianna, D. M., and M. L. Brandao. 2003. "Anatomical connections of the periaqueductal gray: specific neural substrates for different kinds of fear."  Braz J Med Biol Res 36 (5):557-66. doi: 10.1590/s0100-879x2003000500002.

      Walker, D. L., and M. Davis. 1997. "Involvement of the dorsal periaqueductal gray in the loss of fear-potentiated startle accompanying high footshock training."  Behav Neurosci 111 (4):692-702. doi: 10.1037//0735-7044.111.4.692.

      Wang, L., I. Z. Chen, and D. Lin. 2015. "Collateral pathways from the ventromedial hypothalamus mediate defensive behaviors."  Neuron 85 (6):1344-58. doi: 10.1016/j.neuron.2014.12.025.

      Wei, P., N. Liu, Z. Zhang, X. Liu, Y. Tang, X. He, B. Wu, Z. Zhou, Y. Liu, J. Li, Y. Zhang, X. Zhou, L. Xu, L. Chen, G. Bi, X. Hu, F. Xu, and L. Wang. 2015. "Processing of visually evoked innate fear by a non-canonical thalamic pathway."  Nat Commun 6:6756. doi: 10.1038/ncomms7756.

      Yeh, L. F., T. Ozawa, and J. P. Johansen. 2021. "Functional organization of the midbrain periaqueductal gray for regulating aversive memory formation."  Mol Brain 14 (1):136. doi: 10.1186/s13041-021-00844-0.

      Yilmaz, M., and M. Meister. 2013. "Rapid innate defensive responses of mice to looming visual stimuli."  Curr Biol 23 (20):2011-5. doi: 10.1016/j.cub.2013.08.015.

      Zhou, Z., X. Liu, S. Chen, Z. Zhang, Y. Liu, Q. Montardy, Y. Tang, P. Wei, N. Liu, L. Li, R. Song, J. Lai, X. He, C. Chen, G. Bi, G. Feng, F. Xu, and L. Wang. 2019. "A VTA GABAergic Neural Circuit Mediates Visually Evoked Innate Defensive Responses."  Neuron 103 (3):473-488 e6. doi: 10.1016/j.neuron.2019.05.027.

    1. Author response

      Reviewer #1 (Public Review):

      The authors aimed to investigate if 2-hydroxybutyrate (2HB), a metabolite induced by exercise, influences physiological changes, particularly metabolic alterations post-exercise training. They treated young mice and cultured myoblasts with 2HB, conducted exercise tests, metabolomic profiling, gene expression analysis, and knockdown experiments to understand 2HB's mechanisms. Their findings indicate that 2HB enhances exercise tolerance, boosts branch chain amino acid (BCAA) enzyme gene expression in skeletal muscles, and increases oxidative capacity. They also highlight the role of SIRT4 in these effects. This study establishes 2HB, once considered a waste product, as a regulator of exercise-induced metabolic processes. The study's strength lies in its consistent results across in vitro, in vivo, and ex vivo analyses.

      The authors propose a mechanism in which 2HB inhibits BCAA breakdown, raises NAD+/NADH ratio, activates SIRT4, increases ADP ribosylation, and controls gene expression.

      However, some questions remain unclear based on these findings:

      This study focused on the effects of short-term exercise (1 or 5 bouts of treadmill running) and short-term 2HB treatment (1 or 4 days of treatment). Adaptations to exercise training typically occur progressively over an extended period. It's important to investigate the effects of long-term 2HB treatment and whether extended combined 2HB treatment and exercise training have independent, synergistic, or antagonistic effects.

      We agree with the reviewer that investigation of longer-term 2HB treatment may potentially yield interesting findings with more implications to exercise physiology. To investigate the effects of 2HB treatment against or in combination with a progressive exercise training protocol would require an experiment duration between 4 to 12 weeks, based on previous studies (Systematic Review by Massett et al., Frontiers in Physiology, 2021, 10.3389/fphys.2021.782695). However, our experience with these types of experiments is that such a pursuit would require a breadth of work beyond the scope of this current study. For instance, if there were evidence of weakened effect of 2HB over time, one may be compelled to investigate other organs such as the liver to find signs of metabolic adaptation to the exogenous metabolite. If there were additive or synergistic effects on exercise performance, one may be compelled to investigate changes to the cardiovascular system in addition to the skeletal muscle. Additional questions would be raised around the skeletal muscle as well, including assessment of structural and fibre-type changes. Further, these additional mechanisms would need to be characterized in a time course fashion. Rather, we view the scope of the current study to be the acute response to 2HB as an initial report on mechanistic effects of 2HB.

      Exercise training leads to significant mitochondrial changes, including increased mitochondrial biogenesis in skeletal muscle. It would be valuable to compare the impact of 2HB treatment on mitochondrial content and oxidative capacity in treated mice to that in exercised mice.

      We agree with the author that it is of interest to investigate how 2HB may affect mitochondrial biogenesis. However, our preliminary findings were that 2HB-treated MEFs, C2C12s, and mouse soleus muscles showed no change in PGC1α gene expression after four days of treatment (data not shown). As a follow-up assessment of mitochondrial protein expression, although not specific to mtDNA derived genes, we quantified the expression of the respiratory chain proteins in cells and soleus muscle and found no effect of 2HB treatment (SFig. 5,6). At this stage we conclude that there is not evidence of 2HB modifying mitochondrial biogenesis in this time frame and that further investigation would be best suited to a follow-up study such as one interested in long-term exercise training.

      The authors demonstrate that 2-ketobutyrate (2KB) can serve as an oxidative fuel, suggesting a role for the intact BCAA catabolic pathway. However, it's puzzling that the knockout of BCKDHA, a subunit crucial for the second step of BCAA catabolism, did not result in changes in oxidative capacity in cultured myoblasts.

      While we report the BCKDH complex to be dispensable for 2KB oxidation it is important to note that previous studies have reported the following: (1) that 2KB is a viable substrate for BCKDH, (2) that 2KB is a viable substrate for pyruvate dehydrogenase, and (3) that pyruvate dehydrogenase is also dispensable for 2KB oxidation (see Steele et al., J Nutr., 114: 701-710, and Paxton et al. Biochem J., 234:295-303). Collectively, these data have led previous studies to conclude that BCKDH and pyruvate dehydrogenase are redundant for the first step of 2KB oxidation, with a preference for BCKDH. The flux through either may depend upon the metabolic environment. The aim for figure 3C was to determine whether the BCAA degradation pathway was required for 2KB oxidation. We conclude that this pathway is required, first at the step of PCC.

      While these past studies were mentioned in paragraph 2 of the discussion, in light of the reviewer’s comment we have expanded this paragraph. We have added language to explain that future research interested in the presented 2HB mechanism should carefully consider BCKDH and PDH expression in the cell or tissue of interest, as the metabolism of 2KB is quite central to the presented mechanism.

      Nevertheless, this innovative model of metabolic signaling during exercise will serve as a valuable reference for informing future.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "A 2-HB-mediated feedback loop regulates muscular fatigue" by the Johnson group reports interesting findings with implications for the health benefits of exercise. The authors use a combination of metabolic/biochemical in vivo and in vitro assays to delineate a metabolic route triggered by 2-HB (a relatively stable metabolite induced by exercise in humans and mice) that controls branched-chain amino transferase enzymes and mitochondrial oxidative capacity. Mechanistically, the author shows that 2-HB is a direct inhibitor of BCAT enzymes that in turn control levels of SIRT4 activity and ADp-ribosylation in the nucleus targeting C/EBP transcription factor, affecting BCAA oxidation genes (see Fig 4i in the paper). Overall, these are interesting and novel observations and findings with relevance to human exercise, with the potential implication of using these metabolites to mimic exercise benefits, or conditions or muscular fatigue that occurs in different human chronic diseases including rheumatic diseases or long COVID.

      Weaknesses:

      There are several experiments/comments that will strengthen the manuscript-

      (1) A final model in Figure 6 integrating the exercise/mechanistic findings, expanding on Fig 4i) will clarify the findings.

      We appreciate the reviewer’s suggestion to incorporate the exercise findings into a summary figure. However, upon internal review we find that such a figure is too similar to Fig 4i to warrant a new diagram.

      (2) In some of the graphs, statistics are missing (e.g Fig 6G).

      Some figures are included primarily for the reader to visualize the data while statistical comparison is conducted in a separate figure, for example Fig 2D-G. However, we have revised the figure legends to ensure that statistical comparisons are described for all appropriate figures, including Fig 6G identified by the reviewer.

      (3) The conclusions on SIRT4 dependency should be carefully written, as it is likely that this is only one potential mechanism, further validation with mouse models would be necessary.

      We appreciate the reviewers feedback and take the point well that a NAD-dependent mechanism will likely stimulate other sirtuins, which are often in fact expressed at greater levels than SIRT4. To reflect this comment in the manuscript we have altered paragraph 5 of the discussion to now focus on sirtuins. We briefly discuss SIRT4 and highlight the need for future consideration of other sirtuins, perhaps particularly mitochondrial sirtuins.

      (4) One of the needed experiments to support the oxidative capacity effects that could be done in cultured cells, is the use of radiosotope metabolites including BCCAs to determine the ability to produce CO2. Alternatively or in combination metabolite flux using isotopes would be useful to strengthen the current results.

      We appreciate the suggestion from the reviewer and we will look to conduct such an experiment in our follow-up work.

      We sincerely thank the reviewers for their input on this study as their suggestions have led to an improved manuscript for the version of record. The reviewer comments are well taken and we are glad that they will be present alongside the final manuscript to provide an important perspective on the work.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Using a cross-modal sensory selection task in head-fixed mice, the authors attempted to characterize how different rules reconfigured representations of sensory stimuli and behavioral reports in sensory (S1, S2) and premotor cortical areas (medial motor cortex or MM, and ALM). They used silicon probe recordings during behavior, a combination of single-cell and population-level analyses of neural data, and optogenetic inhibition during the task.

      Strengths:

      A major strength of the manuscript was the clarity of the writing and motivation for experiments and analyses. The behavioral paradigm is somewhat simple but well-designed and wellcontrolled. The neural analyses were sophisticated, clearly presented, and generally supported the authors' interpretations. The statistics are clearly reported and easy to interpret. In general, my view is that the authors achieved their aims. They found that different rules affected preparatory activity in premotor areas, but not sensory areas, consistent with dynamical systems perspectives in the field that hold that initial conditions are important for determining trial-based dynamics.

      Weaknesses:

      The manuscript was generally strong. The main weakness in my view was in interpreting the optogenetic results. While the simplicity of the task was helpful for analyzing the neural data, I think it limited the informativeness of the perturbation experiments. The behavioral read-out was low dimensional -a change in hit rate or false alarm rate- but it was unclear what perceptual or cognitive process was disrupted that led to changes in these read-outs. This is a challenge for the field, and not just this paper, but was the main weakness in my view. I have some minor technical comments in the recommendations for authors that might address other minor weaknesses.

      I think this is a well-performed, well-written, and interesting study that shows differences in rule representations in sensory and premotor areas and finds that rules reconfigure preparatory activity in the motor cortex to support flexible behavior.

      Reviewer #2 (Public Review):

      Summary:

      Chang et al. investigate neuronal activity firing patterns across various cortical regions in an interesting context-dependent tactile vs visual detection task, developed previously by the authors (Chevee et al., 2021; doi: 10.1016/j.neuron.2021.11.013). The authors report the important involvement of a medial frontal cortical region (MM, probably a similar location to wM2 as described in Esmaeili et al., 2021 & 2022; doi: 10.1016/j.neuron.2021.05.005; doi: 10.1371/journal.pbio.3001667) in mice for determining task rules.

      Strengths:

      The experiments appear to have been well carried out and the data well analysed. The manuscript clearly describes the motivation for the analyses and reaches clear and well-justified conclusions. I find the manuscript interesting and exciting!

      Weaknesses:

      I did not find any major weaknesses.

      Reviewer #3 (Public Review):

      This study examines context-dependent stimulus selection by recording neural activity from several sensory and motor cortical areas along a sensorimotor pathway, including S1, S2, MM, and ALM. Mice are trained to either withhold licking or perform directional licking in response to visual or tactile stimulus. Depending on the task rule, the mice have to respond to one stimulus modality while ignoring the other. Neural activity to the same tactile stimulus is modulated by task in all the areas recorded, with significant activity changes in a subset of neurons and population activity occupying distinct activity subspaces. Recordings further reveal a contextual signal in the pre-stimulus baseline activity that differentiates task context. This signal is correlated with subsequent task modulation of stimulus activity. Comparison across brain areas shows that this contextual signal is stronger in frontal cortical regions than in sensory regions. Analyses link this signal to behavior by showing that it tracks the behavioral performance switch during task rule transitions. Silencing activity in frontal cortical regions during the baseline period impairs behavioral performance.

      Overall, this is a superb study with solid results and thorough controls. The results are relevant for context-specific neural computation and provide a neural substrate that will surely inspire follow-up mechanistic investigations. We only have a couple of suggestions to help the authors further improve the paper.

      (1) We have a comment regarding the calculation of the choice CD in Fig S3. The text on page 7 concludes that "Choice coding dimensions change with task rule". However, the motor choice response is different across blocks, i.e. lick right vs. no lick for one task and lick left vs. no lick for the other task. Therefore, the differences in the choice CD may be simply due to the motor response being different across the tasks and not due to the task rule per se. The authors may consider adding this caveat in their interpretation. This should not affect their main conclusion.

      We thank the Reviewer for the suggestion. We have discussed this caveat and performed a new analysis to calculate the choice coding dimensions using right-lick and left-lick trials (Fig. S3h) on page 8. 

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).”

      We also have included the caveats for using right-lick and left-lick trials to calculate choice coding dimensions on page 13.

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      (2) We have a couple of questions about the effect size on single neurons vs. population dynamics. From Fig 1, about 20% of neurons in frontal cortical regions show task rule modulation in their stimulus activity. This seems like a small effect in terms of population dynamics. There is somewhat of a disconnect from Figs 4 and S3 (for stimulus CD), which show remarkably low subspace overlap in population activity across tasks. Can the authors help bridge this disconnect? Is this because the neurons showing a difference in Fig 1 are disproportionally stimulus selective neurons?

      We thank the Reviewer for the insightful comment and agree that it is important to link the single-unit and population results. We have addressed these questions by (1) improving our analysis of task modulation of single neurons  (tHit-tCR selectivity) and (2) examining the relationship between tHit-tCR selective neurons and tHit-tCR subspace overlaps.  

      Previously, we averaged the AUC values of time bins within the stimulus window (0-150 ms, 10 ms bins). If the 95% CI on this averaged AUC value did not include 0.5, this unit was considered to show significant selectivity. This approach was highly conservative and may underestimate the percentage of units showing significant selectivity, particularly any units showing transient selectivity. In the revised manuscript, we now define a unit as showing significant tHit-tCR selectivity when three consecutive time bins (>30 ms, 10ms bins) of AUC values were significant. Using this new criterion, the percentage of tHittCR selective neurons increased compared with the previous analysis. We have updated Figure 1h and the results on page 4:

      “We found that 18-33% of neurons in these cortical areas had area under the receiver-operating curve (AUC) values significantly different from 0.5, and therefore discriminated between tHit and tCR trials (Fig. 1h; S1: 28.8%, 177 neurons; S2: 17.9%, 162 neurons; MM: 32.9%, 140 neurons; ALM: 23.4%, 256 neurons; criterion to be considered significant: Bonferroni corrected 95% CI on AUC did not include 0.5 for at least 3 consecutive 10-ms time bins).”

      Next, we have checked how tHit-tCR selective neurons were distributed across sessions. We found that the percentage of tHit-tCR selective neurons in each session varied (S1: 9-46%, S2: 0-36%, MM:25-55%, ALM:0-50%). We examined the relationship between the numbers of tHit-tCR selective neurons and tHit-tCR subspace overlaps. Sessions with more neurons showing task rule modulation tended to show lower subspace overlap, but this correlation was modest and only marginally significant (r= -0.32, p= 0.08, Pearson correlation, n= 31 sessions). While we report the percentage of neurons showing significant selectivity as a simple way to summarize single-neuron effects, this does neglect the magnitude of task rule modulation of individual neurons, which may also be relevant. 

      In summary, the apparent disconnect between the effect sizes of task modulation of single neurons and of population dynamics could be explained by (1) the percentages of tHit-tCR selective neurons were underestimated in our old analysis, (2) tHit-tCR selective neurons were not uniformly distributed among sessions, and (3) the percentages of tHit-tCR selective neurons were weakly correlated with tHit-tCR subspace overlaps. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      For the analysis of choice coding dimensions, it seems that the authors are somewhat data limited in that they cannot compare lick-right/lick-left within a block. So instead, they compare lick/no lick trials. But given that the mice are unable to initiate trials, the interpretation of the no lick trials is a bit complicated. It is not clear that the no lick trials reflect a perceptual judgment about the stimulus (i.e., a choice), or that the mice are just zoning out and not paying attention. If it's the latter case, what the authors are calling choice coding is more of an attentional or task engagement signal, which may still be interesting, but has a somewhat different interpretation than a choice coding dimension. It might be worth clarifying this point somewhere, or if I'm totally off-base, then being more clear about why lick/no lick is more consistent with choice than task engagement.

      We thank the Reviewer for raising this point. We have added a new paragraph on page 13 to clarify why we used lick/no-lick trials to calculate choice coding dimensions, and we now discuss the caveat regarding task engagement.  

      “No-lick trials included misses, which could be caused by mice not being engaged in the task. While the majority of no-lick trials were correct rejections (respond-to-touch: 75%; respond-to-light: 76%), we treated no-licks as one of the available choices in our task and included them to calculate choice coding dimensions (Fig. S4c,d,f). To ensure stable and balanced task engagement across task rules, we removed the last 20 trials of each session and used stimulus parameters that achieved similar behavioral performance for both task rules (Fig. 1d; ~75% correct for both rules).”

      In addition, to address a point made by Reviewer 3 as well as this point, we performed a new analysis to calculate choice coding dimensions using right-lick vs left-lick trials. We report this new analysis on page 8:

      “Choice coding dimensions were obtained from left-lick and no-lick trials in respond-to-touch blocks and right-lick and no-lick trials in respond-to-light blocks. Because the required lick directions differed between the block types, the difference in choice CDs across task rules (Fig. S4f) could have been affected by the different motor responses. To rule out this possibility, we did a new version of this analysis using right-lick and left-lick trials to calculate the choice coding dimensions for both task rules. We found that the orientation of the choice coding dimension in a respond-to-touch block was still not aligned well with that in a respond-to-light block (Fig. S4h;  magnitude of dot product between the respond-to-touch choice CD and the respond-to-light choice CD, mean ± 95% CI for true vs shuffled data: S1: 0.39 ± [0.23, 0.55] vs 0.2 ± [0.1, 0.31], 10 sessions; S2: 0.32 ± [0.18, 0.46] vs 0.2 ± [0.11, 0.3], 8 sessions; MM: 0.35 ± [0.21, 0.48] vs 0.18 ± [0.11, 0.26], 9 sessions; ALM: 0.28 ± [0.17, 0.39] vs 0.21 ± [0.12, 0.31], 13 sessions).” 

      We added discussion of the limitations of this new analysis on page 13:

      “However, we also calculated choice coding dimensions using only right- and left-lick trials. In S1, S2, MM and ALM, the choice CDs calculated this way were also not aligned well across task rules (Fig. S4h), consistent with the results calculated from lick and no-lick trials (Fig. S4f). Data were limited for this analysis, however, because mice rarely licked to the unrewarded water port (# of licksunrewarded port  / # of lickstotal , respond-to-touch: 0.13, respond-to-light: 0.11). These trials usually came from rule transitions (Fig. 5a) and, in some cases, were potentially caused by exploratory behaviors. These factors could affect choice CDs.”

      The authors find that the stimulus coding direction in most areas (S1, S2, and MM) was significantly aligned between the block types. How do the authors interpret that finding? That there is no major change in stimulus coding dimension, despite the change in subspace? I think I'm missing the big picture interpretation of this result.

      That there is no significant change in stimulus coding dimensions but a change in subspace suggests that the subspace change largely reflects a change in the choice coding dimensions.

      As I mentioned in the public review, I thought there was a weakness with interpretation of the optogenetic experiments, which the authors generally interpret as reflecting rule sensitivity. However, given that they are inhibiting premotor areas including ALM, one might imagine that there might also be an effect on lick production or kinematics. To rule this out, the authors compare the change in lick rate relative to licks during the ITI. What is the ITI lick rate? I assume pretty low, once the animal is welltrained, in which case there may be a floor effect that could obscure meaningful effects on lick production. In addition, based on the reported CI on delta p(lick), it looks like MM and AM did suppress lick rate. I think in the future, a task with richer behavioral read-outs (or including other measurements of behavior like video), or perhaps something like a psychological process model with parameters that reflect different perceptual or cognitive processes could help resolve the effects of perturbations more precisely.

      Eighteen and ten percent of trials had at least one lick in the ITI in respond-to-touch and  respond-tolight blocks, respectively. These relatively low rates of ITI licking could indeed make an effect of optogenetics on lick production harder to observe. We agree that future work would benefit from more complex tasks and measurements, and have added the following to make this point (page 14):

      “To more precisely dissect the effects of perturbations on different cognitive processes in rule-dependent sensory detection, more complex behavioral tasks and richer behavioral measurements are needed in the future.”

      Reviewer #2 (Recommendations For The Authors):

      I have the following minor suggestions that the authors might consider in revising this already excellent manuscript :

      (1) In addition to showing normalised z-score firing rates (e.g. Fig 1g), I think it is important to show the grand-average mean firing rates in Hz.

      We thank the Reviewer for the suggestion and have added the grand-average mean firing rates as a new supplementary figure (Fig. S2a). To provide more details about the firing rates of individual neurons, we have also added to this new figure the distribution of peak responses during the tactile stimulus period (Fig. S2b).

      (2) I think the authors could report more quantitative data in the main text. As a very basic example, I could not easily find how many neurons, sessions, and mice were used in various analyses.

      We have added relevant numbers at various points throughout the Results, including within the following examples:

      Page 3: “To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM and ALM (Fig. 1e-g, Fig. S1a-h, and Fig. S2a; S1: 6 mice, 10 sessions, 177 neurons, S2: 5 mice, 8 sessions, 162 neurons, MM: 7 mice, 9 sessions, 140 neurons, ALM: 8 mice, 13 sessions, 256 neurons).”

      Page 5: “As expected, single-unit activity before stimulus onset did not discriminate between tactile and visual trials (Fig. 2d; S1: 0%, 177 neurons; S2: 0%, 162 neurons; MM: 0%, 140 neurons; ALM: 0.8%, 256 neurons). After stimulus onset, more than 35% of neurons in the sensory cortical areas and approximately 15% of neurons in the motor cortical areas showed significant stimulus discriminability (Fig. 2e; S1: 37.3%, 177 neurons; S2: 35.2%, 162 neurons; MM: 15%, 140 neurons; ALM: 14.1%, 256 neurons).”

      Page 6: “Support vector machine (SVM) and Random Forest classifiers showed similar decoding abilities

      (Fig. S3a,b; medians of classification accuracy [true vs shuffled]; SVM: S1 [0.6 vs 0.53], 10 sessions, S2

      [0.61 vs 0.51], 8 sessions, MM [0.71 vs 0.51], 9 sessions, ALM [0.65 vs 0.52], 13 sessions; Random

      Forests: S1 [0.59 vs 0.52], 10 sessions, S2 [0.6 vs 0.52], 8 sessions, MM [0.65 vs 0.49], 9 sessions, ALM [0.7 vs 0.5], 13 sessions).”

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).” 

      Page 8: “In contrast, we found that S1, S2 and MM had stimulus CDs that were significantly aligned between the two block types (Fig. S4e; magnitude of dot product between the respond-to-touch stimulus CDs and the respond-to-light stimulus CDs, mean ± 95% CI for true vs shuffled data: S1: 0.5 ± [0.34, 0.66] vs 0.21 ± [0.12, 0.34], 10 sessions; S2: 0.62 ± [0.43, 0.78] vs 0.22 ± [0.13, 0.31], 8 sessions; MM: 0.48 ± [0.38, 0.59] vs 0.24 ± [0.16, 0.33], 9 sessions; ALM: 0.33 ± [0.2, 0.47] vs 0.21 ± [0.13, 0.31], 13 sessions).”  Page 9: “For respond-to-touch to respond-to-light block transitions, the fractions of trials classified as respond-to-touch for MM and ALM decreased progressively over the course of the transition (Fig. 5d; rank correlation of the fractions calculated for each of the separate periods spanning the transition, Kendall’s tau, mean ± 95% CI: MM: -0.39 ± [-0.67, -0.11], 9 sessions, ALM: -0.29 ± [-0.54, -0.04], 13 sessions; criterion to be considered significant: 95% CI on Kendall’s tau did not include 0).

      Page 11: “Lick probability was unaffected during S1, S2, MM and ALM experiments for both tasks, indicating that the behavioral effects were not due to an inability to lick (Fig. 6i, j; 95% CI on Δ lick probability for cross-modal selection task: S1/S2 [-0.18, 0.24], 4 mice, 10 sessions; MM [-0.31, 0.03], 4 mice, 11 sessions; ALM [-0.24, 0.16], 4 mice, 10 sessions; Δ lick probability for simple tactile detection task: S1/S2 [-0.13, 0.31], 3 mice, 3 sessions; MM [-0.06, 0.45], 3 mice, 5 sessions; ALM [-0.18, 0.34], 3 mice, 4 sessions).”

      (3) Please include a clearer description of trial timing. Perhaps a schematic timeline of when stimuli are delivered and when licking would be rewarded. I may have missed it, but I did not find explicit mention of the timing of the reward window or if there was any delay period.

      We have added the following (page 3): 

      “For each trial, the stimulus duration was 0.15 s and an answer period extended from 0.1 to 2 s from stimulus onset.”

      (4) Please include a clear description of statistical tests in each figure legend as needed (for example please check Fig 4e legend).

      We have added details about statistical tests in the figure legends:

      Fig. 2f: “Relationship between block-type discriminability before stimulus onset and tHit-tCR discriminability after stimulus onset for units showing significant block-type discriminability prior to the stimulus. Pearson correlation: S1: r = 0.69, p = 0.056, 8 neurons; S2: r = 0.91, p = 0.093, 4 neurons; MM: r = 0.93, p < 0.001, 30 neurons; ALM: r = 0.83, p < 0.001, 26 neurons.” 

      Fig. 4e: “Subspace overlap for control tHit (gray) and tCR (purple) trials in the somatosensory and motor cortical areas. Each circle is a subspace overlap of a session. Paired t-test, tCR – control tHit: S1: -0.23, 8 sessions, p = 0.0016; S2: -0.23, 7 sessions, p = 0.0086; MM: -0.36, 5 sessions, p = <0.001; ALM: -0.35, 11 sessions, p < 0.001; significance: ** for p<0.01, *** for p<0.001.”  

      Fig. 5d,e: “Fraction of trials classified as coming from a respond-to-touch block based on the pre-stimulus population state, for trials occurring in different periods (see c) relative to respond-to-touch → respondto-light transitions. For MM (top row) and ALM (bottom row), progressively fewer trials were classified as coming from the respond-to-touch block as analysis windows shifted later relative to the rule transition. Kendall’s tau (rank correlation): MM: -0.39, 9 sessions; ALM: -0.29, 13 sessions. Left panels: individual sessions, right panels: mean ± 95% CI. Dash lines are chance levels (0.5). e, Same as d but for respond-to-light → respond-to-touch transitions. Kendall’s tau: MM: 0.37, 9 sessions; ALM: 0.27, 13 sessions.”

      Fig. 6: “Error bars show bootstrap 95% CI. Criterion to be considered significant: 95% CI did not include 0.”

      (5) P. 3 - "To examine how the task rules influenced the sensorimotor transformation occurring in the tactile processing stream, we performed single-unit recordings from sensory and motor cortical areas including S1, S2, MM, and ALM using 64-channel silicon probes (Fig. 1e-g and Fig. S1a-h)." Please specify if these areas were recorded simultaneously or not.

      We have added “We recorded from one of these cortical areas per session, using 64-channel silicon probes.”  on page 3.  

      (6) Figure 4b - Please describe what gray and black lines show.

      The gray traces are the distance between tHit and tCR trajectories in individual sessions and the black traces are the averages across sessions in different cortical areas. We have added this information on page 6 and in the Figure 4b legend. 

      Page 6: “To assess this for the four cortical areas, we quantified how the tHit and tCR trajectories diverged from each other by calculating the Euclidean distance between matching time points for all possible pairs of tHit and tCR trajectories for a given session and then averaging these for the session (Fig. 4a,b; S1: 10 sessions, S2: 8 sessions, MM: 9 sessions, ALM: 13 sessions, individual sessions in gray and averages across sessions in black; window of analysis: -100 to 150 ms relative to stimulus onset; 10 ms bins; using the top 3 PCs; Methods).

      Fig. 4b: “Distance between tHit and tCR trajectories in S1, S2, MM and ALM. Gray traces show the time varying tHit-tCR distance in individual sessions and black traces are session-averaged tHit-tCR distance (S1:10 sessions; S2: 8 sessions; MM: 9 sessions; ALM: 13 sessions).”

      (7) In addition to the analyses shown in Figure 5a, when investigating the timing of the rule switch, I think the authors should plot the left and right lick probabilities aligned to the timing of the rule switch time on a trial-by-trial basis averaged across mice.

      We thank the Reviewer for suggesting this addition. We have added a new figure panel to show the probabilities of right- and left-licks during rule transitions (Fig. 5a).

      Page 8: “The probabilities of right-licks and left-licks showed that the mice switched their motor responses during block transitions depending on task rules (Fig. 5a, mean ± 95% CI across 12 mice).” 

      (8) P. 12 - "Moreover, in a separate study using the same task (Finkel et al., unpublished), high-speed video analysis demonstrated no significant differences in whisker motion between respond-to-touch and respond-to-light blocks in most (12 of 14) behavioral sessions.". Such behavioral data is important and ideally would be included in the current analysis. Was high-speed videography carried out during electrophysiology in the current study?

      Finkel et al. has been accepted in principle for publication and will be available online shortly. Unfortunately we have not yet carried out simultaneous high-speed whisker video and electrophysiology in our cross-modal sensory selection task.

      Reviewer #3 (Recommendations For The Authors):

      (1) Minor point. For subspace overlap calculation of pre-stimulus activity in Fig 4e (light purple datapoints), please clarify whether the PCs for that condition were constructed in matched time windows. If the PCs are calculated from the stimulus period 0-150ms, the poor alignment could be due to mismatched time windows.

      We thank the Reviewer for the comment and clarify our analysis here. We previously used timematched windows to calculate subspace overlaps. However, the pre-stimulus activity was much weaker than the activity during the stimulus period, so the subspaces of reference tHit were subject to noise and we were not able to obtain reliable PCs. This caused the subspace overlap values between the reference tHit and control tHit to be low and variable (mean ± SD, S1:  0.46± 0.26, n = 8 sessions, S2: 0.46± 0.18, n = 7 sessions, MM: 0.44± 0.16, n = 5 sessions, ALM: 0.38± 0.22, n = 11 sessions).  Therefore, we used the tHit activity during the stimulus window to obtain PCs and projected pre-stimulus and stimulus activity in tCR trials onto these PCs. We have now added a more detailed description of this analysis in the Methods (page 32). 

      “To calculate the separation of subspaces prior to stimulus delivery, pre-stimulus activity in tCR trials (100 to 0 ms from stimulus onset) was projected to the PC space of the tHit reference group and the subspace overlap was calculated. In this analysis, we used tHit activity during stimulus delivery (0 to 150 ms from stimulus onset) to obtain reliable PCs.”   

      We acknowledge this time alignment issue and have now removed the reported subspace overlap between tHit and tCR during the pre-stimulus period from Figure 4e (light purple). However, we think the correlation between pre- and post- stimulus-onset subspace overlaps should remain similar regardless of the time windows that we used for calculating the PCs. For the PCs calculated from the pre-stimulus period (-100 to 0 ms), the correlation coefficient was 0.55 (Pearson correlation, p <0.01, n = 31 sessions). For the PCs calculated from the stimulus period (0-150 ms), the correlation coefficient was 0.68 (Figure 4f, Pearson correlation, p <0.001, n = 31 sessions). Therefore, we keep Figure 4f.  

      (2) Minor point. To help the readers follow the logic of the experiments, please explain why PPC and AMM were added in the later optogenetic experiment since these are not part of the electrophysiology experiment.

      We have added the following rationale on page 9.

      “We recorded from AMM in our cross-modal sensory selection task and observed visually-evoked activity (Fig. S1i-k), suggesting that AMM may play an important role in rule-dependent visual processing. PPC contributes to multisensory processing51–53 and sensory-motor integration50,54–58.  Therefore, we wanted to test the roles of these areas in our cross-modal sensory selection task.”

      (3) Minor point. We are somewhat confused about the timing of some of the example neurons shown in figure S1. For example, many neurons show visually evoked signals only after stimulus offset, unlike tactile evoked signals (e.g. Fig S1b and f). In addition, the reaction time for visual stimulus is systematically slower than tactile stimuli for many example neurons (e.g. Fig S1b) but somehow not other neurons (e.g. Fig S1g). Are these observations correct?

      These observations are all correct. We have a manuscript from a separate study using this same behavioral task (Finkel et al., accepted in principle) that examines and compares (1) the onsets of tactile- and visually-evoked activity and (2) the reaction times to tactile and visual stimuli. The reaction times to tactile stimuli were slightly but significantly shorter than the reaction times to visual stimuli (tactile vs visual, 397 ± 145 vs 521 ± 163 ms, median ± interquartile range [IQR], Tukey HSD test, p = 0.001, n =155 sessions). We examined how well activity of individual neurons in S1 could be used to discriminate the presence of the stimulus or the response of the mouse. For discriminability for the presence of the stimulus, S1 neurons could signal the presence of the tactile stimulus but not the visual stimulus. For discriminability for the response of the mouse, the onsets for significant discriminability occurred earlier for tactile compared with visual trials (two-sided Kolmogorov-Smirnov test, p = 1x10-16, n = 865 neurons with DP onset in tactile trials, n = 719 neurons with DP onset in visual trials).

    1. Author response:

      [The following is the authors’ response to the current reviews.]

      In response to Reviewer #2, we agree with the reviewer that it needs to be noted that not all forms of recognition are the same and have added the following: "However, we note that not all forms of recognition are the same; researchers may prefer to have their work featured instead of personal stories or critiques of the scientific environment."


      [The following is the authors’ response to the previous reviews.]

      We thank both reviewers for their detailed comments and insightful suggestions. Below we summarize our responses to each concern in addition to the edits within the manuscript.

      We would also like to add a clarification to the eLife assessment, it states “This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work.” We show that individuals with names predicted to be from women or East Asian name origins are less likely to be quoted or mentioned in Nature’s scientific news stories than expected by publication demographics. In this study, we did not compare the level of coverage of a scientific article by the demographics of the authors of the article.

      Reviewer #1

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

      We have adapted our wording in the text and added a more detailed discussion which hopefully makes the paper easier to comprehend. These changes are described in the context of your reviewer's suggestions and addressed in the next section.

      Language use: Male/Female refers to sex, not to gender.

      We have now updated the language throughout the text. Thank you for pointing this out.

      Regional disparities are not the same as names' origin. While the first might relate to the academic origin of authors, inferred from their institutional belonging, the latter reflects the authors' inferred identity. Ethnic identities and the construction of prejudice against specific populations need proper contextualization.

      We have added better contextualization in the manuscript and reworded the section in our results and discussion to clarify that we are analyzing disparities related to perceived ethnicity and not regions. We also added the following text to the results section “In our analysis, we use name origin as an estimate for the perceived ethnicity of a primary source by a journalist. Our prediction is not intended to assign ethnicity to an individual, but to be used broadly as a tool to quantify representational differences in a journalist's sociologically constructed perception of a primary source's ethnicity.” We also added the following text to our Discussion: “Our use of name origins is a proxy for a journalist's or referring scholarly peer’s potential perceptions of the ethnicity of a primary source as signaled by an individual's name. We do not intend to assign an identity to an individual, but to generate a broad metric to measure possible bias for particular ethnicities during journalists' primary source gathering.”

      It would be helpful to have a clear definition of what are quotes, mentions, and citations. For me, it was not so clear and made understanding the results more difficult.

      We added the following text to the results section Extracted Data Used for Analysis: “Quoted names are any names that were attached to a quote within the article. Mentioned names are any names that were stated within the article. Cited names are all author names of a scientific paper that was cited in the news article.”

      The comparison against Nature published research articles is not perfect because journalists will also cover articles not published in Nature. If for example, the gender representation in the quoted articles is not the same between Nature journals and other journals, then this source of inequality would be missing (e.g. if the journalists are biased against women, but not as much when they published in Nature, because they are also biased towards Nature articles). Also, the gender representation among Nature authors could not be the same as in general. Nevertheless, this seems to be a fair benchmark, especially if the authors did not have access to other more comprehensive databases. But a statement of limitations including these potential issues would be good to have.

      To add better context to the generalizability of our work, we added the following text to our discussion: “Furthermore, the news articles present on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership.”

      "we select the highest probability origin for each name as the resultant assignment". Threshold based approaches for race/ethnicity name-based inference have been criticized by the literature as they might reproduce biases (see Kozlowski, D., Murray, D. S., Bell, A., Hulsey, W., Larivière, V., Monroe-White, T., & Sugimoto, C. R. (2022). Avoiding bias when inferring race using name-based approaches. Plos one, 17(3), e0264270.). The authors could use the full distribution of probabilities over names instead of selecting one. The formulae proposed (3-5) could be easily adapted to this change.

      We thank the author for pointing this out. We have updated our analysis to use the probabilities instead of hard assignments. Figure 3 and formulae 3-5 have been updated. While we observe a slight shift in the calculated values, the overall trends are unchanged.

      Is it possible to make an analysis that intersects both name origin and gender? I am not sure if the sample size would allow for this, but if some other dimensions were collapsed, it would be very important to show what happens at the intersection of these two dimensions of discrimination.

      We agree that identifying any differences in quotation patterns at the intersection of gender and name origin would be very useful to identify. To address this, we added supplemental table 5. This table identifies the number of quotes per predicted name origin and gender over all years and article types. In this table, we don’t see a significant difference in gender distribution across predicted name origins.

      Given a larger sample size, we would be able to better identify more subtle differences, but at this sample size, we cannot make more detailed inferences. Additionally, this also addresses a QC-issue, where predicted gender accuracy varies by name origin, specifically East Asian name origin. From our data, we don’t see a large difference in proportions across any name origin. We added the following text to the results section to incorporate this analysis:

      “However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]

      . In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table 5). “

      The use of vocabulary should be more homogeneous. For example, in page 13 the authors start to use the concepts of over/under enrichment, which appeared before in a title but was not used.

      The text has been updated to remove all mentions of “over/under enrichment” with “over/under representation”

      In the discussions section, it would be important to see as a statement of limitations the problems that automatic origin and gender inference have.

      We thank the reviewer for this suggestion. We have added the following paragraph to our discussion.

      Computational tools enabled us to automatically analyze thousands of articles to identify existing disparities by gender and name origin, but these tools are not without limitations. Our tools are unable to identify non-binary people and rely on gender predictors that are known to have region-specific biases, with the largest decrease in performance on names of an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. Furthermore, name origin is only a proxy for externally perceived racial or ethnic origins of a source or author and is not as accurate as self-identified race or ethnicity. Self-identification better captures the lived experience of an individual that computational estimates from a name can not capture. This is highlighted in our inability to distinguish between Black and White people from the US by their names. As the collection of demographic data by publication outlets grows, we believe this will enable a more fine-grained and accurate analysis of disparities in scientific journalism.

      Figures 2a and 3a show that the affiliations of authors and their countries was going to be used in this analysis. Yet, this section is not present in the article. I would encourage the authors to add this to the analysis as it would show important patterns, and to intersect the dimensions of gender, name origin and country.

      We were interested in using this analysis in our work, but unfortunately the sample size of cited works in each country was too small to make inferences. If this work was extended to larger scientific outlets to include larger corpora such as The Guardian or New York Times, we think one could be able to make more robust inferences. Since our work only focuses on Nature, we decided not to include this analysis. However, we do include a section in our discussion for future work.

      “As a proxy for measuring possible geographical bias of a journalist, we attempted to identify if there was any geographical bias of cited authors. To do this, we identified the affiliation of each cited author and identified their affiliated country. Unfortunately, we could not robustly extract a large enough number of cited authors from different countries to make any conclusive statements. Expanding our work to other science journalism outlets could help identify possible ways in which geographic region, genders, and perceived ethnicity interact and affect scientific visibility of specific groups. While we are unable to identify that journalists have a specific geographical bias, having reporters explicitly focused on specific regional sources will broaden coverage of international opinions in science.”

      It is not clear at that point what column dependence means.

      The abstract has been updated to state, “Gender disparity in Nature quotes was dependent on the article type.”

      Reviewer #2

      We thank the reviewer for their very detailed and insightful suggestions regarding our analysis and the key caveats that needed better contextualization in our analysis. We went through each major point the reviewer brought up below and included any additional text that was needed.

      In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation).

      We thank the review for pointing this out, we have removed all instances of depletion/enrichment for over/under-representation

      Caveats to Claim 1. So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      We thank the reviewer for their detailed feedback on this section. We have added the missing contextualization of our results. In the results section, I changed the figure caption to: “Speakers predicted to be men are sometimes overrepresented in quotes, but this depends on the year and article type.” Added the following paragraph “When considering the relative proportion of authors and speakers predicted to be men, we only find a slight over-representation of men. This overrepresentation is dependent on the authorship position and the year. Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      Generalizability to other contexts of science journalism:

      We thank the reviewer for their feedback on the generalizability of our work. We have now added the following text to our discussion to provide the reader with a better context of our results: “To articles presented on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found very similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The

      Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership. ”

      Shallow discussion:

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format

      We thank the reviewer for encouraging us to better contextualize our findings in the broader discourse. We have now added several sections to our Discussion. To address gender parity, we have added the following text: “This finding, coupled with the near equal number of articles written by journalists predicted to be men or women, argues for more diversity in topical coverage. "Career Feature" articles highlight current topics relevant to working scientists and frequently highlight systemic issues with the scientific environment. This column allows space for marginalized people to critique the current state of affairs in science or share their personal stories. This type of content encourages the journalist to seek out a diverse set of primary sources. Including more content that is not primarily focused on recent publications, but all topics surrounding the practice of science, can serve as an additional tool to rapidly achieve gender parity in journalistic recognition.”

      Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      We thank the reviewer for asking bringing up these important questions. We have added better context to our first author analysis in our discussion. We have included the following two sections to address this. Also, we want to state that we find last authors to be slightly more quoted than first authors, as depicted in Fig. 2d., with first author quotation percentage largely appearing below the red line. We included this text in a response above and include it again here for convenience.

      “Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins.

      Furthermore, we see that the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names? According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications; Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

      To address this point, we have added the following text to our discussion: “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins. Furthermore, the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      I am very confused by Figure 1B. It mixes the counts of News-related items with (non-Springer) research articles in a single stacked bar plot which makes determining the quantity of either difficult. I would advise splitting them out

      Figure 1B has been updated, and the News and Research articles have been separated.

      When querying the first 2000 or so results from the SpringerNature API, are the authors certain that they are getting a random sample of papers?

      These papers were the first 200 English language "Journal" papers returned by the Springer Nature API for each month, resulting in 2400 papers per year from 2005 through 2020. These papers are the first 200 papers published each month by a Springer Nature journal, which may not be completely random, but we believe to be a reasonably representative sample. Furthermore, the Springer Nature comparator set is being used as an additional comparator to the complete set of all Nature research papers used in our analyses.

      In all figures: the authors use capital letters to indicate panels in the caption, but lowercase letters in the figure itself and in the main text. This should be made consistent.

      This has been updated.

      In all figures: the authors should make the caption letter bold in the figure captions, which makes it much easier to find descriptions of specific panels

      This has been updated.

      In the section "coreNLP": the authors mention "co-reference resolution" but without really remarking why it is being used. This is an issue throughout the methods-the authors describe what method they are using but either they don't mention why they are using that method until later, or else not at all.

      We have added better reasoning behind our coreNLP selected methods: “We used the standard set of annotaters: tokenize, ssplit, pos, lemma, ner, parse, coref, and additionally the quote annotator. These perform text tokenization, sentence splitting, part of speech recognition, lemmatization, named entity recoginition, division of sentences into constituent phrases, co-reference resolution, and identification of quoted entities, respectively. We used the "statistical" algorithm to perform coreference resolution for speed. Each of these aspects is required to identify the names of quoted or mentioned speakers and identify any of their associated pronouns. All results were output to json format for further downstream processing.”

      We included a better description of scrapy: “Scrapy is a tool that applies user-defined rules to follow hyperlinks on webpages and return the information contained on each webpage.

      We used Scrapy to extract all web pages containing news articles and extract the text.”

      We also included our motivation for bootstrapping: “We used the boostrap method to construct confidence intervals for each of our calculated statistics.”

      In the section "Name Formatting for Gender Prediction in Quotes or Mentions", genderizeR is mentioned before an introduction to the tool

      We added the following text to provide context: “Even though genderizeR, the computational method used to predict the name's gender, only uses the first name to make the gender prediction, identifying the full name gives us greater confidence that we correctly identified the first name. “

      In the section "Name Formatting for Gender Prediction of Authors", you state that you exclude papers with only one author. How many papers is this? I assume few, in Nature, but if not I can imagine gender differences based on who writes first-authored papers.

      We find that the number excluded is roughly 7% of all papers, which is consistent across Nature and Springer Nature (1113/15013 for cited springer articles, 2899/42155 for random springer articles, 955/12459 for nature authors). We have added the following text to the manuscript for better context: “Roughly 7% of all papers were estimated to be by a single author and removed from this analysis.: 1113/15013 for cited Springer articles, 2899/42155 for random Springer articles, 955/12459 for Nature research articles.”

      In "Name Origin Analysis", for the in-text reference to Equation 3: include the prefix "Eq." or similar to mark this as referencing the equation and not something else

      This has been updated.

      The use of the word "enrichment" in reference to the representation of East Asian authors is strange and does not fit the colloquial definition of the term. I suggest just using a simpler term like "representation" instead

      Similarly, the authors use the word "depletion" to reflect the lower rate of quotes to scientists with East-Asian names, but I feel a simpler word would be more appropriate.

      We thank the reviewer for this suggestion, all instances of “enrichment/depletion” have been replaced with “over/under representation”

      The authors claim in Figure 2d that there is a steady increase in the rate of first author citations, however, this graph is not convincing. It appears to show much more noise than anything resembling a steady change.

      We have reworded our figure description to state that there is a consistent bias towards quoting last authors. Our figure description now states: “Panel d shows a consistent but slight bias towards quoting the last author of a cited article than the first author over time.”

      Supplemental Figures 1b and 1c do not seem to be mentioned in the main text, and I struggle to see their relevance.

      We thank the reviewer for identifying this error; these subpanels have been removed.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Point-by-point response to concerns raised by reviewer #3:

      The manuscript has improved very substantially in revision. The authors have clearly taken the comments on board in good faith. Yet, some small concerns remain around the behavioural analysis.

      In Fig. 8H and H' average sleep/day is ~100. Is this minutes of sleep? 100 min/day is far too low, is it a typo?

      The numbers for sleep bouts are also too low to me e.g. in Fig 9 number of sleep bouts avg around 4.

      In their response to reviewers the authors say these errors were fixed, yet the figures appear not to have been changed. Perhaps the old figures were left in inadvertently?

      Indeed this correction was somehow missed and we thank the reviewer for noticing this. We have now corrected Fig 8H-H’ and Fig 9D.  

      The circadian anticipatory activity analyses could also be improved. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). This typically computed as the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition.

      In their response to reviewers, the authors have revised their anticipation analyses by quantifying the mean activity in the 6 hrs preceding light transition. However, in the method of Harrisingh et al., anticipation is the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition. Simply computing the activity in the 6hrs preceding light transition does not give a measure of anticipation, determining the ratio is key.

      We acknowledge the importance of obtaining accurate results in our analysis, therefore we have re-evaluated the anticipation activity by measuring the ratio of the mean activity in the 3h preceding light transition over the activity in the 6h preceding light transition. We have reported the data as percentages in Fig 8F-G and modified the figure legends accordingly.

    1. Author response:

      eLife assessment 

      This important study provides evidence for a combination of the latest generation of Oxford Nanopore Technology long reads with state-of-the art variant callers enabling bacterial variant discovery at accuracy that matches or exceeds the current "gold standard" with short reads. The evidence supporting the claims of the authors is convincing, although the inclusion of a larger number of reference genomes would further strengthen the study. The work will be of interest to anyone performing sequencing for outbreak investigations, bacterial epidemiology, or similar studies. 

      We thank the editor and reviewers for the accurate summary and positive assessment. We address the comment about increasing the number of reference genomes in the response to reviewer 2.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors assess the accuracy of short variant calling (SNPs and indels) in bacterial genomes using Oxford Nanopore reads generated on R10.4 flow cells from a very similar genome (99.5% ANI), examining the impact of variant caller choice (three traditional variant callers: bcftools, freebayes, and longshot, and three deep learning based variant callers: clair3, deep variant, and nano caller), base calling model (fast, hac and sup) and read depth (using both simplex and duplex reads). 

      Strengths: 

      Given the stated goal (analysis of variant calling for reads drawn from genomes very similar to the reference), the analysis is largely complete and results are compelling. The authors make the code and data used in their analysis available for re-use using current best practices (a computational workflow and data archived in INSDC databases or Zenodo as appropriate). 

      Weaknesses: 

      While the medaka variant caller is now deprecated for diploid calling, it is still widely used for haploid variant calling and should at least be mentioned (even if the mention is only to explain its exclusion from the analysis). 

      We agree that this would be an informative addition to the study and will add it to the benchmarking.

      Appraisal: 

      The experiments the authors engaged in are well structured and the results are convincing. I expect that these results will be incorporated into "best practice" bacterial variant calling workflows in the future. 

      Thank you for the positive appraisal.

      Reviewer #2 (Public Review): 

      Summary: 

      Hall et al describe the superiority of ONT sequencing and deep learning-based variant callers to deliver higher SNP and Indel accuracy compared to previous gold-standard Illumina short-read sequencing. Furthermore, they provide recommendations for read sequencing depth and computational requirements when performing variant calling. 

      Strengths: 

      The study describes compelling data showing ONT superiority when using deep learning-based variant callers, such as Clair3, compared to Illumina sequencing. This challenges the paradigm that Illumina sequencing is the gold standard for variant calling in bacterial genomes. The authors provide evidence that homopolymeric regions, a systematic and problematic issue with ONT data, are no longer a concern in ONT sequencing. 

      Weaknesses: 

      (1) The inclusion of a larger number of reference genomes would have strengthened the study to accommodate larger variability (a limitation mentioned by the authors). 

      Our strategic selection of 14 genomes—spanning a variety of bacterial genera and species, diverse GC content, and both gram-negative and gram-positive species (including M. tuberculosis, which is neither)—was designed to robustly address potential variability in our results. Moreover, all our genome assemblies underwent rigorous manual inspection as the quality of the true genome sequences is the foundation this research is built upon. Given this, the fundamental conclusions regarding the accuracy of variant calls would likely remain unchanged with the addition of more genomes.  However, we do acknowledge that a substantially larger sample size, which is beyond the scope of this study, would enable more fine-grained analysis of species differences in error rates.

      (2) In Figure 2, there are clearly one or two samples that perform worse than others in all combinations (are always below the box plots). No information about species-specific variant calls is provided by the authors but one would like to know if those are recurrently associated with one or two species. Species-specific recommendations could also help the scientific community to choose the best sequencing/variant calling approaches.

      Thank you for highlighting this observation. The precision, recall, and F1 scores for each sample and condition can be found in Supplementary Table S4. We will investigate the samples that consistently perform below expectation to determine if this is associated with specific species, which may necessitate tailored recommendations for those species. Additionally, we will produce a species-segregated version of Figure 2 for a clearer interpretation and will place it in the supplementary materials.

      (3) The authors support that a read depth of 10x is sufficient to achieve variant calls that match or exceed Illumina sequencing. However, the standard here should be the optimal discriminatory power for clinical and public health utility (namely outbreak analysis). In such scenarios, the highest discriminatory power is always desirable and as such an F1 score, Recall and Precision that is as close to 100% as possible should be maintained (which changes the minimum read sequencing depth to at least 25x, which is the inflection point).

      We agree that the highest discriminatory power is always desirable for clinical or public health applications. In which case, 25x is probably a better minimum recommendation. However, we are also aware that there are resource-limited settings where parity with Illumina is sufficient. In these cases, 10x depth from ONT would provide sufficient data.

      The manuscript currently emphasises the latter scenario, but we will revise the text to clearly recommend 25x depth as a conservative aim in settings where resources are not a constraint, ensuring the highest possible discriminatory power for applications like outbreak analysis.

      (4) The sequencing of the samples was not performed with the same Illumina and ONT method/equipment, which could have introduced specific equipment/preparation artefacts that were not considered in the study. See for example https://academic.oup.com/nargab/article/3/1/lqab019/6193612

      To our knowledge, there is no evidence that sequencing on different ONT machines or barcoding kits leads to a difference in read characteristics or accuracy. To ensure consistency and minimise potential variability, we used the same ONT flowcells for all samples and performed basecalling on the same Nvidia A100 GPU. We will update the methods to emphasise this.

      For Illumina and ONT, the exact machines used for which samples will be added as a supplementary table. We will also add a comment about possible Illumina error rate differences in the ‘Limitations’ section of the Discussion.

      In summary, while there may be specific equipment or preparation artifacts to consider, we took steps to minimise these effects and maintain consistency across our sequencing methods.

      Reviewer #3 (Public Review): 

      Hall et al. benchmarked different variant calling methods on Nanopore reads of bacterial samples and compared the performance of Nanopore to short reads produced with Illumina sequencing. To establish a common ground for comparison, the authors first generated a variant truth set for each sample and then projected this set to the reference sequence of the sample to obtain a mutated reference. Subsequently, Hall et al. called SNPs and small indels using commonly used deep learning and conventional variant callers and compared the precision and accuracy from reads produced with simplex and duplex Nanopore sequencing to Illumina data. The authors did not investigate large structural variation, which is a major limitation of the current manuscript. It will be very interesting to see a follow-up study covering this much more challenging type of variation. 

      We fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      In their comprehensive comparison of SNPs and small indels, the authors observed superior performance of deep learning over conventional variant callers when Nanopore reads were basecalled with the most accurate (but also computationally very expensive) model, even exceeding Illumina in some cases. Not surprisingly, Nanopore underperformed compared to Illumina when basecalled with the fastest (but computationally much less demanding) method with the lowest accuracy. The authors then investigated the surprisingly higher performance of Nanopore data in some cases and identified lower recall with Illumina short read data, particularly from repetitive regions and regions with high variant density, as the driver. Combining the most accurate Nanopore basecalling method with a deep learning variant caller resulted in low error rates in homopolymer regions, similar to Illumina data. This is remarkable, as homopolymer regions are (or, were) traditionally challenging for Nanopore sequencing. 

      Lastly, Hall et al. provided useful information on the required Nanopore read depth, which is surprisingly low, and the computational resources for variant calling with deep learning callers. With that, the authors established a new state-of-the-art for Nanopore-only variant, calling on bacterial sequencing data. Most likely these findings will be transferred to other organisms as well or at least provide a proof-of-concept that can be built upon. 

      As the authors mention multiple times throughout the manuscript, Nanopore can provide sequencing data in nearly real-time and in remote regions, therefore opening up a ton of new possibilities, for example for infectious disease surveillance. 

      However, the high-performing variant calling method as established in this study requires the computationally very expensive sup and/or duplex Nanopore basecalling, whereas the least computationally demanding method underperforms. Here, the manuscript would greatly benefit from extending the last section on computational requirements, as the authors determine the resources for the variant calling but do not cover the entire picture. This could even be misleading for less experienced researchers who want to perform bacterial sequencing at high performance but with low resources. The authors mention it in the discussion but do not make clear enough that the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required. 

      We have provided runtime benchmarks for basecalling in Supplementary Figure S16 and detailed these times in Supplementary Table S7. In addition, we state in the Results section (P10 L228-230) “Though we do note that if the person performing the variant calling has received the raw (pod5) ONT data, basecalling also needs to be accounted for, as depending on how much sequencing was done, this step can also be resource-intensive.”

      Even with super-accuracy basecalling considered, our analysis shows that variant calling remains the most resource-intensive step for Clair3, DeepVariant, FreeBayes, and NanoCaller. Therefore, the statement “the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required”, is incorrect. However, we will endeavour to make the basecalling component and considerations more prominent in the Results and Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable study that describes the effects of T. pallidum on neural development by applying single-cell RNA sequencing to an iPSC-derived brain organoid model. The evidence supporting the claims of the authors is solid, although further evidence to understand the differences in infection rates would strengthen the conclusions of the study. In particular, the conclusions would be strengthened by validating infection efficiency as this can impact the interpretation of single-cell sequencing results, and how these metrics affect organoid size as well as comparison with additional infectious agents. Furthermore, additional validations of downstream effectors are not adequate and could be improved. 

      Thank you very much for your valuable comments. Since we used the organoid model for the first time to investigate the effects of T. pallidum on brain development, the study design is not perfect. As you have accurately mentioned, the results of the paper do not have more in-depth details, especially to verify the infection rate of T. pallidum. Your valuable comments will be very useful for us for carrying out further research. In addition, the downstream effector validation is inadequate, so we performed an analysis of single-cell sequencing data to strengthen our view in the revised manuscript (See Figure 5F for a description in current manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study by Xu et al showing the effects of infection with the Treponema pallidum virus (which causes syphilis disease) on neuronal development using iPSC-derived human brain organoids as a model and single-cell RNA sequencing. This work provides an important insight into the impact of the virus on human development, bridging the gap between the phenomena observed in studies using animal models as well as non-invasive human studies showing developmental abnormalities in fetuses infected with the virus in utero through maternal vertical transmission.

      Using single-cell RNAseq in combination with qPCR and immunofluorescence techniques, the authors show that T. pallidum infected organoids are smaller in size, in particular during later growth stages, contain a larger number of undifferentiated neuronal lineage cells, and exhibit decreased numbers of specific neuronal subcluster, which the authors have identified as undifferentiated hindbrain neurons.

      The study is an important first step in understanding how T. pallidum affects human neuronal development and provides important insight into the potential mechanisms that underlie the neurodevelopmental abnormalities observed in infected human fetuses. Several important weaknesses have also been noted, which need to be addressed to strengthen the study's conclusions.

      Strengths:

      (1) The study is well written, and the data quality is good for the most part.

      (2) The study provides an important first step in utilizing human brain organoids to study the impact of T. pallidum infection on neuronal development.

      (3) The study's conclusions may provide important insight to other researchers focused on studying how viral infections impact neuronal development. 

      Thank you very much for your positive feedback. Below, you will find our detailed responses to your concerns, addressed point-by-point. I once again sincerely appreciate your time and effort in reviewing our manuscript.

      Weaknesses:

      (1) It is unclear how T. pallidum infection was validated in the organoids. If not all cells are infected, this could have important implications for the study's conclusions, in particular the single-cell RNAseq experiments. Were only cells showing the presence of the virus selected for sequencing? A detailed description of how infection was validated and the process of selection of cells for RNAseq would strongly support the study's conclusions. 

      Thank you for your valuable comment. We completely agree with your point. Exploring the infection rate of T. pallidum to brain organoids is a key factor that must be considered. We selected pluripotent stem cell-derived brain organoids to simulate the process of foetal brain neurodevelopment and cultured them mixed with T. pallidum to mimic T. pallidum invading brain tissue. Since brain organoids are three-dimensional structures formed by nerve cell aggregation, T. pallidum invades organoids from the periphery to the center of the organoids gradually. T. pallidum acts on organoids long enough to increase the infection rates; however, the pathogen is selective in invading human cells. If we only select cells present in T. pallidum for sequencing, the authenticity of simulating "real world" infections is somewhat weakened. To better carry out this study, selecting cells from intact organoids for sequencing, without eliminating cells without T. pallidum, can better simulate the effect of T. pallidum infection on the nervous system. Of course, we should also set up a blank control group.

      (2) The authors show that T. pallidum infection results in impaired development of hindbrain neurons. How does this finding compare to what has already been shown in animal studies? Is a similar deficit in this brain region observed with this specific virus? It would be useful to strengthen the study's conclusions if the authors added a discussion about the observed deficits in hindbrain neuronal development, and prior literature on similar studies conducted in animal models or human patients. Does T. pallidum preferentially target these neurons, or is this a limitation of the current organoid model system? 

      Thank you for your valuable comments. The finding that T. pallidum infection results in impaired development of hindbrain neurons has not been verified in animal experiments. Of course, it is better to further validate the findings in organoid studies through animal experiments. Unfortunately, due to the technical challenges, mature animal models have not been developed for the study of congenital syphilis. Although our team has been working on the development of animal models of congenital neurosyphilis, the current progress is still not satisfactory. After struggling hard in this field for many years, we decided to attempt to utilize human brain organoids instead of animal models to study the impact of T. pallidum infection on neuronal development.

      We also checked prior literature on similar studies that have referred to the content in human patients. Dan Doherty et al. reported that patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (PMID: 23518331). Based on your constructive suggestions, we have added some content related to hindbrain to the “Discussion” section.

      Our study found that T. pallidum could inhibit the differentiation of subNPC1B in brain organoids, thereby reducing the differentiation from subNPC1B to hindbrain neurons, and ultimately affecting the development and maturation of hindbrain neurons during pregnancy. Based on our results, T. pallidum does not preferentially target hindbrain neurons. Of course, there are limitations to the current organoid model system, see the "Limitations" section.

      PMID: 23518331- Dan Doherty et al, Midbrain and hindbrain malformations: advances in clinical diagnosis, imaging, and genetics.

      Revision in the “Discussion” section, line 343-352:

      “The vertebrate hindbrain contains a complex network of dedicated neural circuits that play an essential role in controlling many physiological processes and behaviors, including those related to the cerebellum, pons, and medulla oblongata (Shoja et al., 2018). Patients with pontocerebellar hypoplasia represent the less severe end of the spectrum with early hyperreflexia, developmental delay, and feeding problems, eventually developing spasticity and involuntary movements in childhood, while some patients represent the severe end of the spectrum characterised by polyhydramnios, severe hyperreflexia, contracture, and early death from central respiratory failure. Patients with pontocerebellar hypoplasia develop microcephaly at birth or over time after birth (Doherty et al., 2013).”

      (3) The authors show that T. pallidum-infected organoids are smaller in size by measuring organoid diameter during later stages of organoid growth, with no change during early stages. Does that represent insufficient infection at the early stages? Is this due to increased cell death or lack of cell division in the infected organoids? Experiments using IHC to quantify levels of cleaved caspase and/or protein markers for cell proliferation would be able to address these questions. 

      Thank you for your valuable suggestion. The concentration of T. pallidum in patients with syphilis was generally very low (PMID: 21752804, 35315702, 33099614). In this study, a low concentration of T. pallidum was applied to brain organoids to simulate early foetal transmission of syphilis. Nerve cells mainly establish intercellular connections to form brain organoids in the way of adhesion, which can easily cause organoids to divide and die if treated with a high concentration of T. pallidum. Furthermore, based on your suggestions, we performed additional immunostaining analyses to verify the apoptosis of brain organoids infected by T. pallidum. Cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased following T. pallidum infection; however, the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors), which would be not enough to affect the results of the experiment, thereby suggesting that neural differentiation and development of brain organoids were mainly inhibited following T. pallidum infection (rather than promoting organoid apoptosis).

      PMID: 21752804-- Craig Tipple et al, Getting the measure of syphilis: qPCR to better understand early infection.

      PMID: 35315702-- Cuini Wang et al, Quantified Detection of Treponema pallidum DNA by PCR Assays in Urine and Plasma of Syphilis Patients.

      PMID: 33099614—Cuini Wang et al, A New Specimen for Syphilis Diagnosis: Evidence by High Loads of Treponema pallidum DNA in Saliva.

      Revision in the “Results” section, line 105-108:

      “… cleaved caspase 3 (clCASP3) staining showed that the number of apoptotic cells increased significantly following T. pallidum infection, but the proportion of apoptotic cells in both groups of brain organoids was very low (Figure supplement 2) (N=12 organoids, each group from three independent bioreactors) …”

      Revision in the “Materials and methods” section, line 446-447:

      “…anti-cleaved caspase 3 (rabbit, 1:100, Cell Signaling Technology, 9661S),”

      Revision in the “Supplementary File” section, line 78-81:

      Author response image 1.

      The number of clCASP3+ cells in the microscopic field of brain organoids. A nonparametric t-test was used to evaluate the statistical differences between the two groups. (**: P < 0.01).

      (4) In Figure 1D authors show differences in rosette-like structure in the infected organoids. The representative images do not appear to be different in any of the discussed components (e.g., the sox2 signal looks fairly similar between the two conditions). No quantification of these structures was presented. Authors should provide quantification or a more representative image to support their statement. 

      Thank you for your valuable suggestion. I have quantified the neural rosette structure and compared the number of intact rosette-like structures between the two groups (See Figure 1D for a description in current manuscript).

      (5) The IHC images shown in Figures 3E, G, and Figure 4E look very similar between the two conditions despite the discussed decrease in the text. A more suitable representative image should be presented, or the analysis should be amended to reflect the observed results. 

      Thank you for your valuable suggestion. I have replaced more representative images in Figure 3E, G, and Figure 4E in the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study provides an important overview of infectious etiology for neurodevelopment delay.

      Strengths:

      Strong RNA evaluation.

      Weaknesses:

      The study lacks an overview of other infectious agents. The study should address the epigenetic contributors (PMID: 36507115) and the role of supplements in improving outcomes (PMID: 27705610). 

      Addressing the above - with references included - is recommended. 

      Thank you for your valuable comment. Our research is mainly inspired by other infectious agents, such as Zika virus; there are many descriptions of Zika virus in the “Discussion” section of the manuscript to better describe and demonstrate our point of view (See pages 12–13). I was unable to retrieve the article (PMID: 36507115), kindly help in confirming the PMID number. I will be very grateful if you can provide the full text. Secondly, I have carefully read the article (PMID: 27705610), which is a very rich and comprehensive review, and summarised and cited it in appropriate places in our manuscript.

      Revision in the “Discussion- limitation” section, line 375-379:

      “First, although several recent protocols have made use of growth factors to promote further neuronal maturation and survival (Lucke-Wold et al., 2018), the organoid culture scheme needs to be further improved owing to the lower percentage of mature neurons and the challenge of cell necrosis within the organoids at this stage in day 55 organoids.”

      Reviewer #3 (Public Review): 

      This article is the first report to study the effects of T. pallidum on the neural development of an iPSC-derived brain organoid model. The study indicates that T. pallidum inhibits the differentiation of subNPC1B neurons into hindbrain neurons, hence affecting brain organoid neurodevelopment. Additionally, the TCF3 and notch signaling pathways may be involved in the inhibition of the subNPC1B-hindbrain neuron differentiation axis. While the majority of the data in this study support the conclusions, there are still some questions that need to be addressed and data quality needs to be improved. The study provides valuable insights for future investigations into the mechanisms underlying congenital neurodevelopment disability. 

      I sincerely appreciate your comments on our paper. The comments have helped us greatly improve the quality of our paper. Thank you for your time and constructive critique.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Paired t-test analysis is not appropriate if two distinct groups are compared. 

      I sincerely apologize for our presentation. We used a nonparametric t-test to compare the two groups. I have confirmed and corrected the statistical method description of this manuscript (Revision in the “Materials and methods” section (line 553-555) and “Figures-legend” section (line 789-790, 817-818, 829-830) in current manuscript).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Can the authors explain why the mean size of organoids infected with T. pallidum is smaller?

      Thank you for your valuable comment. In our study, T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone and caused fewer and incomplete rosette-like structures. Next, the ventricular zone is also the main area where neural progenitor cells (NPCs) reside (PMID: 33838105); our results showed that the proportion of neural progenitor cells (NPC)1 was reduced after T. pallidum infection. Rosette-like structure size changes owing to NPC depletion. Therefore, the mean size of organoids infected with T. pallidum is smaller.

      Revision in the “Results” section, line 101-104:

      “T. pallidum infection resulted in brain organisational changes in neural rosette-like structures resembling the proliferative regions of the human ventricular zone where NPC reside (Krenn et al., 2021), and caused fewer and incomplete rosette-like structures (P < 0.01) (Figure 1D)”

      (2) Why was the target gene for qRT-PCR validation selected to be HOXA5、HOXC5、HOXA4?

      Thank you for your valuable comment. The qRT-PCR experiment was selected here to verify the analysis results of the scRNA-seq. HOX family genes are key factors controlling early hindbrain development, which are expressed in the hindbrain region during the gastrulation stage of early embryonic development and persist into the nerve cell stage, and are essential for the correct induction of hindbrain development and segmentation (PMID: 2571936, 1983472, 1673098, 15930115). Therefore, we selected the HOX family gene for verification.

      PMID: 2571936-WILKINSON D G, et al. Segmental expression of Hox-2 homoeobox- containing genes in the developing mouse hindbrain.

      PMID: 1983472-- FROHMAN M A, et al. Isolation of the mouse Hox-2.9 gene; analysis of embryonic expression suggests that positional information along the anterior-posterior axis is specified by mesoderm.

      PMID: 1673098--MURPHY P, et al. Expression of the mouse labial-like homeobox-containing genes, Hox 2.9 and Hox 1.6, during segmentation of the hindbrain.

      PMID: 15930115-- MCNULTY C L, et al. Knockdown of the complete Hox paralogous group 1 leads to dramatic hindbrain and neural crest defects.

      (3) Why was qRT-PCR not employed in other experimental validations, but solely to validate early neural-specific transcription factor changes?

      Thank you for your valuable comment. The qRT-PCR experiment was selected to validate early neural-specific transcription factor changes, indicating the reliability of the scRNA-seq. Then, validated scRNA-seq data were used to analyze for other neuro-specific gene differences, such as violin plots and heatmap showing differentially expressed genes (Figure 4D and Figure 5B, C). Of course, we also tested it with other experiments, such as immunohistochemistry and flow cytometric screening.

      (4) The authors found that T. pallidum might reduce the differentiation from subNPC1B to hindbrain neurons by inhibiting subNPC1B differentiation in brain organoids. Why were the subNPC1B-specific markers declining?

      Thank you for your valuable comment. scRNA-seq is aimed at complete brain organoids. Cluster analysis of cell types of organoids is performed according to specific marker genes of different cells. The decrease in the expression of marker genes of certain cell groups indicates that the cell proportion of such cell groups in the whole organoids is reduced. We analysed organoids following T. pallidum infection, uniform manifold approximation and projection (UMAP), and clustering of the NPC1 population demonstrated that T. pallidum downregulated the number of subNPC1B population. Therefore, the results demonstrated a decrease in the subNPC1B -specific markers.

      (5) In comparison to the other figures, Figure 5E letter size is excessively small and ambiguous.

      Thanks for your valuable comments, I have adjusted Figure 5E letter size.

      (6) Figure 5E shows that TCF3, more than one gene, is specifically enriched in subNPC1B of the T. pallidum group. It is best to confirm the impact of the other gene. 

      Thank you for raising this key issue that we had not addressed properly in our previous version of the manuscript; we have added further analytical data. The SCENIC analysis found that the transcriptional activity of 52 genes has significantly changed after T. pallidum infection. Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in four key pathways of neural differentiation and development. TCF3 is the sole transcription factor present in all four terms simultaneously, speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.

      Revision in the “Results” section, line 261-273:

      “Next, the single-cell regulatory network inference and clustering (SCENIC) analysis for the subNPC1B subcluster was performed to assess the differences in the transcriptional activity of the transcription factors between the two groups and found that the transcriptional activity of 52 genes significantly changed after T. pallidum infection (Figure 5E). Furthermore, GO analyses demonstrated that 27 transcription factors were significantly enriched in key pathways of neural differentiation and development in response to nervous system development, positive regulation of sequence-specific DNA-binding transcription factor activity, positive regulation of neuronal differentiation, and DNA templated transcription regulation. Remarkably, transcription factor 3 (TCF3) is the sole transcription factor present in all four terms simultaneously (Figure 5F), speculating that TCF3 is the key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum.”

      Revision in the “Materials and methods” section, line 540-543:

      “The Sankey diagram was created using SankeyMATIC (https://sankeymatic.com/) (Zhang et al., 2023), which was used to characterize the interactions between differential transcription factors and neural differentiation and development.”

      Revision in the “Figure and Figure Legend” section, line 832, 842-844:

      Author response image 2.

      Sankey diagram showing the correspondence between differential transcription factors and neural differentiation and development.

      (7) Are there other experiments demonstrating that TCF3 is a key transcription factor for the inhibition of subNPC1B-hindbrain neuron differentiation caused by T. pallidum

      Thank you for your valuable comment. In the previous experiment, we attempted to select a subNPC1B subcluster by flow sorting to verify the relevant molecular mechanism. Due to the small proportion of subNPC1B subcluster in the whole organoids, the selected cells were in a poor state and could not reach the number of cells required for the experiment. However, we used scRNA-seq data to further identify TCF3 as a key transcription factor that inhibits subNPC1B - hindbrain neuron differentiation induced by T. pallidum. The relevant results and descriptions of the analysis are detailed in the revised manuscript, please see our response to point (6) above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors use the innovative CRISPRi method to uncover regulators of cell density and volume in neutrophils. The results show that cells require NHE activity during chemoattractant-driven cell migration. Before migration occurs, cells also undergo a rapid cell volume increase. These results indicate that water flux, driven by ion channels, appears to play a central role in neutrophil migration. The paper is very well written and clear. I suggest adding some discussion about the role of actin in the process, but this is not essential.

      Strengths

      The novel use of CRIPSPi to uncover cell density regulators is very novel. Some of the uncovered molecules were known before, e.g. discussed in Li & Sun, Frontiers in Cell and Developmental Biology, 2021. Others are more interesting, for example PI3K-gamma. The use of caged fMLP is also nice.

      We thank the reviewer for their positive appraisal of our work and have pursued their suggestions for improving our paper in this revision.

      Weaknesses

      One area of investigation that seems to be absent is mentioned in the introduction. I.e., actin is expected to play a role in regulating cell volume increase. Did the authors perform any experiments with LatA? What was seen there? Do cells still migrate with LatA, or is a different interplay seen? The role of PI3K is interesting, and maybe somewhat related to actin. But this may be a different line of inquiry for the future.

      We agree that we could have done a better job explicitly investigating the role of actin dynamics in volume changes. Towards this end, by using Latrunculin B to depolymerize actin, we find that the volume increase in suspension is not affected (Figure 1 – supplemental figure 2A). In our FxM single cell volume measurements of adherent cells, we similarly observed unhindered swelling following latrunculin treatment. These data indicate that actin is dispensable for chemoattractant-induced cell swelling (Figure 1 – supplemental figure 2B) . There was a minor apparent reduction in the final volume reached with the Latrunculin-treated cells as measured by FxM, but this likely reflects minor uptake of the excluded dye following Latrunculin treatment rather than an actual change in final volume. This conclusion is reinforced by the change in 2D footprint area being well modeled by the 2D projection of an isotropically expanding sphere (Figure 1 – supplemental figure 2C) . Latrunculin treatment completely abolishes migration, as is expected for unconfined migration on fibronectin (Figure 1 – supplemental figure 2D-E) . The second Reviewer also wanted us to dig deeper on the role of PI3K-gamma, so we expanded our analysis of this hit (Figure 3 – supplemental figure 1B-D; Figure 4 – supplemental figure 1D-G) .

      Author response image 1.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (A) Human primary neutrophils were incubated with DMSO or Latrunculin B, activated with 20 nM fMLP, and then volume responses were measured using electronic sizing via a Coulter counter. Latrunculin treatment did not alter cell swelling, indicating that actin polymerization is dispensable for the chemoattractant-induced volume increase. (B) Similar results were obtained using the FxM assay, showing that Latrunculin-treated cells are capable of swelling after stimulation. (C) The Latrunculin-treated cells also increase their footprints, albeit less so than control cells, but this is within the range of what would be expected for this degree of chemoattractant-induced volume increase (modeled by a sphere expanding an equivalent volume). (D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show 15 minutes of tracks with the tracks prior (left) and the 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. The top panels show the large increase in motility displayed by control cells, while the Latrunculin-treated cells (bottom panels) fail to move. (E) Latrunculin-treated cells consistently fail to move in response to chemoattractant-stimulation. (F) Representative single cell volume traces show that Latrunculin-treated cells (black) lack short-term volume fluctuations but persistently maintain an elevated volume following chemoattractant stimulation. Control cells (blue) exhibit short-term volume fluctuations. (G) The lack of short-term volume fluctuations following latrunculin treatment is borne out across the population, with the coefficient of variation in the volume for single cells (post-swelling) being dramatically lower in Latrunculin-treated cells, suggesting that these short term volume fluctuations depend on actin-based motility.

      Author response image 2.

      Additional validation of swelling screen hits. (A) Mixed WT and CRISPR KO dHL-60 populations post-stimulation show that CA2 (black) and PI3Ky (green) KO both fail to decrease their densities as much as the WT (cyan) population following chemoattractant stimulation. Cells with negative control guides (light gray) have normal volume responses. All tubes were fractionated and aligned on the fraction containing the median of the WT population. Negative values indicate a fraction with a higher density than WT. (B) To validate the perturbations to cell swelling observed with FxM, primary human neutrophils were stimulated in suspension, and their volumes were measured using a Coulter counter. 20 nM fMLP was added at the 0 minute mark. Shaded regions represent the 95% confidence intervals. (C) PI3Kγ inhibition blocks the chemoattractant-induced volume change in primary human neutrophils, as assayed by FxM. (D) PI3Kγ inhibition also blocked the chemoattractant-drive shape change in human primary neutrophils, as measured by the change in footprint area in FxM (E) The coefficient of variation in volume for control (cyan) and iNHE1 (gold) inhibited human primary neutrophils undergoing chemokinesis are comparable, suggesting that the volume fluctuations are unchanged in moving cells upon NHE1 and PI3Kγ inhibition despite the different baseline volumes.

      Author response image 3.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging, (B) NHE1 inhibited cells also initiate movement but to a lesser degree, (C) hypo-osmotic shock rescues the NHE1 motility defect. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (F) Control neutrophils (blue) show an increased angular alignment upon stimulation as their motility becomes directional. NHE1-inhibition (gold, iNHE1) has very little effect on this process, while PI3Kγ inhibition (green) leads to a reduction in this alignment at the population level. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      Reviewer #2 (Public Review):

      Nagy et al investigated the role of volume increase and swelling in neutrophils in response to the chemoattractant. Authors show that following chemoattractant response cells lose their volume slightly owing to the cell spreading phase and then have a relatively rapid increase in the cell volume that is concomitant with cell migration. The authors performed an impressive genome-wide CRISPR screen and buoyant density assay to identify the regulators of neutrophil swelling. This assay showed that stimulating cells with chemoattractant fMLP led to an increase in the cell volume that was abrogated with the FPR1 receptor knockout. The screen revealed a cascade that could potentially be involved in cell swelling including NHE1 (sodium-proton antiporter) and PI3K. NHE1 and PI3K are required for chemoattractant-induced swelling in human primary neutrophils. Authors also suggest slightly different functions of NHE1 and PI3K activity where PI3K is also required to maintain chemoattractant-induced cell shape changes. The authors convincingly show that chemoattractant-induced cell swelling is linked to cell migration and NHE1 is required for swelling at the later stages of swelling since the cells at the early point work on low-volume and low-velocity regime. Interestingly, the authors also show that lack of swelling in NHE1-inhibited cells could be rescued by mild hypo-osmotic swelling strengthening the argument that water influx followed chemoattractant stimulation is important for potentiation for migration.

      The conclusions of this paper are mostly well supported by data and are pretty convincing, but some aspects of image acquisition and data analysis need to be clarified and extended.

      We thank the reviewer for their positive appraisal of our work and pursued their suggestions for improving our paper in this revision.

      Weaknesses

      (1) It would really help if the authors could add the missing graph for the footprint area when cells are treated with Latranculin. Graph S1F for volume changes with Lat treatment should be compared with DMSO-treated controls.

      We agree that the Latrunculin condition merits more thorough investigation. To this end, we compared the volume response of human primary neutrophils to chemoattractant addition for Latrunculin B treated cells versus DMSO controls in suspension and show that there is no difference in swelling (Figure 1 – supplemental figure 2A) . This is additionally confirmed with FxM measurements with a slight undershooting of the final volume likely due to minor uptake of the excluded dye by Latrunculin treated cells (Figure 1 – supplemental figure 2B) . We have also included the requested footprint area changes in the Latrunculin treated cells as compared to controls (Figure 1 – supplemental figure 2C) . The treated cell footprints increase much less than the controls, and this is likely due to a lack of active cell spreading in the Latrunculin treated cells. The increase in footprint area observed following latrunculin treatment is within the range of what would be expected for the 2D projection of an isotropically expanding sphere fitted to the Latrunculin volume data (salmon line).

      Author response image 4.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (A) Human primary eutrophils were incubated with DMSO or Latrunculin B, activated with 20 nM fMLP, and then volume responses were measured using electronic sizing via a Coulter counter. Latrunculin treatment did not alter cell swelling, indicating that actin polymerization is dispensable for the chemoattractant-induced volume increase. (B) Similar results were obtained using the FxM assay, showing that Latrunculin-treated cells are capable of swelling after stimulation. (C) The Latrunculin-treated cells also increase their footprints, albeit less so than control cells, but this is within the range of what would be expected for this degree of chemoattractant-induced volume increase (modeled by a sphere expanding an equivalent volume).

      (2) The authors show inhibition of NHE1 blocked cell swelling using Coulter counter, a similar experiment should be done with PI3K inhibitions especially since they see PI3K inhibition impact chemoattractant-induced cell shape change.

      Good idea. PI3Ky inhibition led to a substantial reduction in the chemoattractant-driven swelling in suspension showing the critical role of PI3K in the swelling of human primary neutrophils (Figure 3 – supplemental figure 1B) .

      Author response image 5.

      Additional validation of swelling screen hits. (B) To validate the perturbations to cell swelling observed with FxM, primary human neutrophils were stimulated in suspension, and their volumes were measured using a Coulter counter. 20 nM fMLP was added at the 0 minute mark. Shaded regions represent the 95% confidence intervals.

      (3) It would be more convincing visually if the authors could also include the movie of cell spreading (footprint) and then mobility with PI3K inhibition.

      Included as suggested. We agree this is a more compelling way to present the data (Figure 4 – supplemental figure 1A-D,G)

      Author response image 6.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      (4) It is not clear how cell spreading and later volume increase are linked to overall mobility of neutrophils. Are authors suggesting that cell spreading is not required for cell mobility in neutrophils?

      We did not mean to imply that cell spreading is not required for neutrophil motility. We take advantage of the fact that we can inhibit cell swelling without inhibiting spreading to investigate the specific role of swelling on migration ( Figure 4) . Conversely, cell spreading on a substrate is not required for chemoattractant-induced cell swelling, as chemoattractant-induced swelling occurs in latrunculin-treated cells (Figure 1 – supplemental figure 2A-C) . However, these latrunculin-treated cells are not able to migrate, at least not in the context studied here (Figure 1 – supplemental figure 2 D-E) . Cell spreading and swelling are likely both critical contributors to neutrophil motility, but their relative importance is dependent on the migratory context. The single cell volume fluctuation analysis indicates that migration-associated spreading and shape changes have large impacts on cell volume ( Figure 1 F) . These fluctuations are asynchronous, obscuring their observation at the population level, but the single cell traces clearly demonstrate them and their correlation with movement.

      ( 5) Volume fluctuations associated with motility were impacted by NHE1 inhibition at the baselines, what about PI3K inhibitions? Does that impact the actual fluctuations?

      PI3K inhibition causes a significant fraction of cells to stop migrating (Figure 4 – supplemental figure 1D) , but among those that do move, they are still able to fluctuate in volume (Figure 4 – supplemental figure 1G) .

      Author response image 7.

      Additional validation of motility phenotypes. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

      In contrast, latrunculin abolishes the volume fluctuations that normally accompany migration (Figure 1 – supplemental figure 2F-G) . These data suggest that movement/spreading itself is the driver of the rapid volume fluctuations. In contrast, the sustained volume increase following chemoattractant stimulation is independent of shape change and still occurs in latrunculin-treated cells.

      Author response image 8.

      Chemoattractant-induced swelling, but not motility, is independent of actin polymerization. (F) Representative single cell volume traces show that Latrunculin-treated cells (black) lack short-term volume fluctuations but persistently maintain an elevated volume following chemoattractant stimulation. Control cells (blue) exhibit short-term volume fluctuations. (G) The lack of short-term volume fluctuations following latrunculin treatment is borne out across the population, with the coefficient of variation in the volume for single cells (post-swelling) being dramatically lower in Latrunculin-treated cells, suggesting that these short term volume fluctuations depend on actin-based motility.

      (6) It would really help if the authors compared similar analyses and drew conclusions from that, for example, it is unclear what the authors mean by they found no change in the angular persistence of WT and NHE1 inhibited cells which is in contrast to PI3K inhibition since they do not really have an analysis for angular persistence in PI3K inhibited cells. (S4A and S4B).

      Thanks for catching this oversight in these experiments that we previously performed but neglected to include in the initial submission. We now include plots for angular persistence, velocity, and footprint size for the PI3K-gamma-inhibited cells. The results show that PI3K-gamma inhibition interferes both with swelling (Figure 3 – supplemental figure 1B-D) and motility (Figure 4 – supplemental figure 1D-F) , which aligns with its role upstream of the other hits identified in our screen.

      Author response image 9.

      Additional validation of motility phenotypes. (A-D) Single cell tracks of primary human neutrophils responding to acute chemoattractant stimulation. Both panels show tracks of cells 15 minutes prior (left) versus 15 minutes post (right) uncaging the chemoattractant. The scale bar is 50 microns. Color saturation indicates time with tracks progressing from gray to full color. (A) Control cells show a large increase in movement upon uncaging, (B) NHE1 inhibited cells also initiate movement but to a lesser degree, (C) hypo-osmotic shock rescues the NHE1 motility defect. (D) PI3Kγ leads to a large fraction of cells failing to initiate movement. (E) PI3Kγ inhibition showed near complete blockage of the chemoattractant-induced motility increase in primary human neutrophils. (F) Control neutrophils (blue) show an increased angular alignment upon stimulation as their motility becomes directional. NHE1-inhibition (gold, iNHE1) has very little effect on this process, while PI3Kγ inhibition (green) leads to a reduction in this alignment at the population level. (G) For the PI3Kγ inhibited cells that start migrating, the migration-induced volume fluctuations are comparable to iNHE1 and control cells. The top panel shows the track of a representative migrating PI3Kγ inhibited cell and the bottom panel, its corresponding volume normalized to the pre-stimulation volume. The scale bar is 50 microns.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors discuss an effect, "diffusive lensing", by which particles would accumulate in high-viscosity regions, for instance in the intracellular medium. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention. The "lensing effect" discussed is a direct consequence of the choice of the Ito convention without spurious drift which has been discussed before and is likely to be inadequate for the intracellular medium, causing the presented results to likely have little relevance for biology.

      We thank the editors and the reviewers for their consideration of our manuscript. We argue in this rebuttal and revision that our results and conclusions are in fact likely to have relevance for biology. While we use the Itô convention for ease of modeling considering its non-anticipatory nature upon discretization (see (Volpe and Wehr 2016) for the discretization schemes), we refer to Figure S1B to emphasize that diffusive lensing occurs not only under the Itô convention but across a wide parameter space. Indeed, it is absent only in the normative isothermal convention; note that even a stochastic differential equation conforming to the isothermal convention may be reformulated into the Itô convention by adding suitable drift terms, allowing for diffusive lensing to be seen even in case of the isothermal convention. We note in particular that the choice of the convention is a highly context-dependent one (Sokolov 2010); there is not a universally correct choice, and one can obtain stochastic differential equations consistent with Ito or Stratonovich interpretations in different regimes. Lastly, space-dependent diffusivity is now an experimentally well-recognized feature of the cellular interior, as noted in our references and as discussed further later in this response. This fact points towards the potential relevance of our model for subcellular diffusion.

      In our revised preprint, we have made changes to the text and minor changes to figures to address reviewer concerns.

      Responses to the Reviewers

      We thank the reviewers for their feedback and address the issues they raised in this rebuttal and in the revised manuscript. The central point that the reviewers raise concerns the validity of the drift-less Itô interpretation in modeling potential nonequilibrium types of subcellular transport arising from space-dependent diffusivity. If the drift term were considered, the resulting stochastic differential equation stochastic differential equation (SDE) is equivalent to one arising from the isothermal interpretation of heterogeneous diffusivity (Volpe and Wehr 2016), wherein no diffusive lensing is seen (as shown in Fig. S1B). That is, the isothermal interpretation and the drift-comprising Itô SDE produce the same uniform steady-state particle densities.

      While we agree with the reviewers that for a given interpretation, equivalent stochastic differential equations (SDEs) arising from other interpretations may be drawn, we disagree with the generalization that all types of subcellular diffusion conform to the isothermal interpretation. That is, there is no reason why any and all instances of nonequilibrium subcellular particle diffusion must be modeled using isothermal-conforming SDEs (such as the drift-comprising Itô SDE, for instance). We refer to (Sokolov 2010) which prescribes choosing a convention in a context-dependent manner. In this regard, we disagree with the second reviewer’s characterization of making such a choice merely a “choice of writing” considering that it is entirely dependent on the choice of microscopic parameters, as detailed in the discussion section of the manuscript. The following references have also been added to the manuscript: the reference from the first reviewer (Kupferman et al. 2004) proposes a prescription for choosing an appropriate convention based upon comparing the noise correlation time and the particle relaxation time. The reference notes that the Itô convention is appropriate when the particle relaxation time is large when compared to the noise correlation time and the Stratonovich convention is appropriate in the converse scenario. In (Rupprecht et al. 2018), active noise is considered and the resulting Fokker-Planck equation conforms to the Stratonovich convention when thermal noise was negligible. The related reference, (Vishen et al. 2019) compares three timescales: those of particle relaxation, noise correlation and viscoelastic relaxation, to make the choice. Indeed, as noted in the manuscript, lensing is seen in all but one interpretation (without drift additions); only its magnitude is altered by the interpretation/choice of the drift term. The appendix has been modified to include a subsection on the interchangeability of the conventions.

      Separately, with regards to the discussion on anomalous diffusion, the section on mean squared displacement calculation has been amended to avoid confusing our model with canonical anomalous diffusion which considers the anomalous exponent; how the anomalous exponent varies with space-dependent diffusivity offers an interesting future area of study.

      Responses to specific reviewer comments appear below.

      Reviewer #1 (Public Review):

      The manuscript "Diffusive lensing as a mechanism of intracellular transport and compartmentalization", explores the implications of heterogeneous viscosity on the diffusive dynamics of particles. The authors analyze three different scenarios:

      (i)   diffusion under a gradient of viscosity,

      (ii)  clustering of interacting particles in a viscosity gradient, and

      (iii) diffusive dynamics of non-interacting particles with circular patches of heterogeneous viscous medium.

      The implications of a heterogeneous environment on phase separation and reaction kinetics in cells are under-explored. This makes the general theme of this manuscript very relevant and interesting. However, the analysis in the manuscript is not rigorous, and the claims in the abstract are not supported by the analysis in the main text.

      Following are my main comments on the work presented in this manuscript:

      (a) The central theme of this work is that spatially varying viscosity leads to position-dependent diffusion constant. This, for an overdamped Langevin dynamics with Gaussian white noise, leads to the well-known issue of the interpretation of the noise term.

      The authors use the Ito interpretation of the noise term because their system is non-equilibrium.

      One of the main criticisms I have is on this central point. The issue of interpretation arises only when there are ill-posed stochastic dynamics that do not have the relevant timescales required to analyze the noise term properly. Hence, if the authors want to start with an ill-posed equation it should be mentioned at the start. At least the Langevin dynamics considered should be explicitly mentioned in the main text. Since this work claims to be relevant to biological systems, it is also of significance to highlight the motivation for using the ill-posed equation rather than a well-posed equation. The authors refer to the non-equilibrium nature of the dynamics but it is not mentioned what non-equilibrium dynamics to authors have in mind. To properly analyze an overdamped Langevin dynamics a clear source of integrated timescales must be provided. As an example, one can write the dynamics as Eq. (1) \dot x = f(x) + g(x) \eta , which is ill-defined if the noise \eta is delta correlated in time but well-defined when \eta is exponentially correlated in time. One can of course look at the limit in which the exponential correlation goes to a delta correlation which leads to Eq. (1) interpreted in Stratonovich convention. The choice to use the Ito convention for Eq. (1) in this case is not justified.

      We thank the reviewer for detailing their concerns with our model’s assumptions. We have addressed them in the common rebuttal.

      (b) Generally, the manuscript talks of viscosity gradient but the equations deal with diffusion which is a combination of viscosity, temperature, particle size, and particle-medium interaction. There is no clear motivation provided for focus on viscosity (cytoplasm as such is a complex fluid) instead of just saying position-dependent diffusion constant. Maybe authors should use viscosity only when talking of a context where the existence of a viscosity gradient is established either in a real experiment or in a thought experiment.

      The manuscript has been amended to use only “diffusivity” to avoid confusion.

      (c) The section "Viscophoresis drives particle accumulation" seems to not have new results. Fig. 1 verifies the numerical code used to obtain the results in the later sections. If that is the case maybe this section can be moved to supplementary or at least it should be clearly stated that this is to establish the correctness of the simulation method. It would also be nice to comment a bit more on the choice of simulation methods with changing hopping sizes instead of, for example, numerically solving stochastic ODE.

      The main point of this section and of Fig. 1 is the diffusive lensing effect itself: the accumulation of particles in lower-diffusivity areas. To the best of our knowledge, diffusive lensing has not been reported elsewhere as a specific outcome of non-isothermal interpretations of diffusion, with potential relevance to nonequilibrium subcellular motilities. The simulation method has been fully described in the Methods section, and the code has also been shared (see Code Availability).

      A minor comment, the statement "the physically appropriate convention to use depends upon microscopic parameters and timescale hierarchies not captured in a coarse-grained model of diffusion." is not true as is noted in the references that authors mention, a correct coarse-grained model provides a suitable convention (see also Phys. Rev. E, 70(3), 036120., Phys. Rev. E, 100(6), 062602.).

      This has been addressed in the common rebuttal.

      (d) The section "Interaction-mediated clustering is affected by viscophoresis" makes an interesting statement about the positioning of clusters by a viscous gradient. As a theoretical calculation, the interplay between position-dependent diffusivity and phase separation is indeed interesting, but the problem needs more analysis than that offered in this manuscript. Just a plot showing clustering with and without a gradient of diffusion does not give enough insight into the interplay between density-dependent diffusion and position-dependent diffusion. A phase plot that somehow shows the relative contribution of the two effects would have been nice. Also, it should be emphasized in the main text that the inter-particle interaction is through a density-dependent diffusion constant and not a conservative coupling by an interaction potential.

      The density-dependence has been added from the Methods to the main text. The goal of the work is to present lensing as a natural outcome of the parameter choices we make and present its effects as they relate to clustering and commonly used biophysical methods to probe dynamics within cells. A dense sampling of the phase space and how it is altered as a function of diffusivity, and the subsequent interpretation, lie beyond the scope of the present work but offer exciting future directions of study.

      (e) The section "In silico microrheology shows that viscophoresis manifests as anomalous diffusion" the authors show that the MSD with and without spatial heterogeneity is different. This is not a surprise - as the underlying equations are different the MSD should be different.

      The goal here is to compare and contrast the ways in which homogeneous and heterogeneous diffusion manifest in simulated microrheology measurements. We hope that an altered saturation MSD, as is observed in our simulations, provokes interest in considering lensing while modeling experimental data.

      There are various analogies drawn in this section without any justification:

      (i) "the saturation MSD was higher than what was seen in the homogeneous diffusion scenario possibly due to particles robustly populating the bulk milieu followed by directed motion into the viscous zone (similar to that of a Brownian ratchet, (Peskin et al., 1993))."

      In case of i), the Brownian ratchet is invoked as a model to explain directed accumulation. We have removed this analogy to avoid confusion as it is not delved into further over the course of our work.

      (ii) "Note that lensing may cause particle displacements to deviate from a Gaussian distribution, which could explain anomalous behaviors observed both in our simulations and in experiments in cells (Parry et al., 2014)." Since the full trajectory of the particles is available, it can be analyzed to check if this is indeed the case.

      This has been addressed in the common rebuttal.

      (f) The final section "In silico FRAP in a heterogeneously viscous environment ... " studies the MSD of the particles in a medium with heterogeneous viscous patches which I find the most novel section of the work. As with the section on inter-particle interaction, this needs further analysis.

      We thank the reviewer for their appreciation. In presenting these three sections discussing the effects of diffusive lensing, we intend to broadly outline the scope of this phenomenon in influencing a range of behaviors. Exploring the directions further comprise promising future directions of study that lie beyond the scope of this manuscript.

      To summarise, as this is a theory paper, just showing MSD or in silico FRAP data is not sufficient. Unlike experiments where one is trying to understand the systems, here one has full access to the dynamics either analytically or in simulation. So just stating that the MSD in heterogeneous and homogeneous environments are not the same is not sufficient. With further analysis, this work can be of theoretical interest. Finally, just as a matter of personal taste, I am not in favor of the analogy with optical lensing. I don't see the connection.

      We value the reviewer’s interest in investigating the causes underlying the differences in the MSDs and agree that it represents a promising future area of study. The main point of this section of the manuscript was to make a connection to experimentally measurable quantities.

      Reviewer #2 (Public Review):

      Summary:

      The authors study through theory and simulations the diffusion of microscopic particles and aim to account for the effects of inhomogeneous viscosity and diffusion - in particular regarding the intracellular environment. They propose a mechanism, termed "Diffusive lensing", by which particles are attracted towards high-viscosity regions where they remain trapped. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention, without spurious drift. They acknowledge the fact that this convention does not describe equilibrium systems, and that their results would not hold at equilibrium - and discard these facts by invoking the fact that cells are out-of-equilibrium. Finally, they show some applications of their findings, in particular enhanced clustering in the high-viscosity regions. The authors conclude that as inhomogeneous diffusion is ubiquitous in life, so must their mechanism be, and hence it must be important.

      Strengths:

      The article is well-written, and clearly intelligible, its hypotheses are stated relatively clearly and the models and mathematical derivations are compatible with these hypotheses.

      We thank the reviewer for their appreciation.

      Weaknesses:

      The main problem of the paper is these hypotheses. Indeed, it all relies on the Ito interpretation of the stochastic integrals. Stochastic conventions are a notoriously tricky business, but they are both mathematically and physically well-understood and do not result in any "dilemma" [some citations in the article, such as (Lau and Lubensky) and (Volpe and Wehr), make an unambiguous resolution of these]. Conventions are not an intrinsic, fixed property of a system, but a choice of writing; however, whenever going from one to another, one must include a "spurious drift" that compensates for the effect of this change - a mathematical subtlety that is entirely omitted in the article: if the drift is zero in one convention, it will thus be non-zero in another in the presence of diffusive gradients. It is well established that for equilibrium systems obeying fluctuation-dissipation, the spurious drift vanishes in the anti-Ito stochastic convention (which is not "anticipatory", contrarily to claims in the article, are the "steps" are local and infinitesimal). This ensures that the diffusion gradients do not induce currents and probability gradients, and thus that the steady-state PDF is the Gibbs measure. This equilibrium case should be seen as the default: a thermal system NOT obeying this law should warrant a strong justification (for instance in the Volpe and Wehr review this can occur through memory effects in robotic dynamics, or through strong fluctuation-dissipation breakdown). In near-equilibrium thermal systems such as the intracellular medium (where, although out-of-equilibrium, temperature remains a relevant and mostly homogeneous quantity), deviations from this behavior must be physically justified and go to zero when going towards equilibrium.

      Considering that the physical phenomena underlying diffusion span a range of timescales (particle relaxation, noise, environmental correlation, et cetera), we disagree with the assertion that all types of subcellular diffusion processes can be modeled as occurring at thermal equilibrium: for example, one can easily imagine memory effects arising in the presence of an appropriate hierarchy of timescales. We have added references that describe in more detail the way in which the comparison of timescales can dictate the applicability of different conventions. We also refer the referee to the common rebuttal section of our response in which we discuss factors that govern the choice of the interpretation. The adiabatic elimination arguments highlighted in (Kupferman et al. 2004) provide a clear description of how relevant particle and environment-related timescales can inform the choice of stochastic calculus to use.

      With regards to the use of the term “anticipatory” to refer to the isothermal interpretation, we refer to the comment in (Volpe and Wehr 2016) of the Itô interpretation “not looking into the future”. In any case, whether anticipatory or otherwise, the interpretation’s effect on our model remains unchanged, as highlighted in the section in the Appendix on the conversion between different conventions; this section has been added to minimize confusion about the effects of the choice of convention on lensing.

      Here, drifts are arbitrarily set to zero in the Ito convention (the exact opposite of the equilibrium anti-Ito), which is the equilibrium equivalent to adding a force (with drift $- grad D$) exactly compensating the spurious drift. If we were to interpret this as a breakdown of detailed balance with inhomogeneous temperature, the "hot" region would be effectively at 4x higher temperature than the cold region (i.e. 1200K) in Fig 1A.

      Our work is based on existing observations of space-dependent diffusivity in cells (Garner et al., 2023; Huang et al., 2021; Parry et al., 2014; Śmigiel et al., 2022; Xiang et al., 2020). These papers support a definitive model for the existence of space-dependent diffusivity without invoking space-dependent temperature.

      It is the effects of this arbitrary force (exactly compensating the Ito spurious drift) that are studied in the article. The fact that it results in probability gradients is trivial once formulated this way (and in no way is this new - many of the references, for instance, Volpe and Wehr, mention this).

      Addressed in the common rebuttal.

      Enhanced clustering is also a trivial effect of this probability gradient (the local concentration is increased by this force field, so phase separation can occur). As a side note the "neighbor sensing" scheme to describe interactions is very peculiar and not physically motivated - it violates stochastic thermodynamics laws too, as the detailed balance is apparently not respected.

      The neighbor-sensing scheme used here is just one possible model of an effective attractive potential between particles. Other models that lead to density-dependent attraction between particles should also provide qualitatively similar results as ours; this offers an interesting prospect for future research.

      Finally, the "anomalous diffusion" discussion is at odds with what the literature on this subject considers anomalous (the exponent does not appear anomalous).

      This has been addressed in the common rebuttal, and the relevant part of the manuscript has been modified to avoid confusion.

      The authors make no further justification of their choice of convention than the fact that cells are out-of-equilibrium, leaving the feeling that this is a detail. They make mentions of systems (eg glycogen, prebiotic environment) for which (near-)equilibrium physics should mostly prevail, and of fluctuation-dissipation ("Diffusivity varies inversely with viscosity", in the introduction). Yet the "phenomenon" they discuss is entirely reliant on an undiscussed mechanism by which these assumptions would be completely violated (the citations they make for this - Gnesotto '18 and Phillips '12 - are simply discussions of the fact that cells are out-of-equilibrium, not on any consequences on the convention).

      Finally, while inhomogeneous diffusion is ubiquitous, the strength of this effect in realistic conditions is not discussed (this would be a significant problem if the effect were real, which it isn't). Gravitational attraction is also an ubiquitous effect, but it is not important for intracellular compartmentalization.

      The manuscript text has been supplemented with additional references that detail the ways in which the comparison of timescales can dictate how one can apply different conventions. We refer the reviewer to the common rebuttal section of our response where we detail factors that dictate the choice of the convention to use. As previously noted, the adiabatic elimination arguments highlighted in (Kupferman et al., 2004) provide a prescription for how different timescales are to be considered in deciding the choice of stochastic calculus to use.

      With regards to the strength of space-dependent diffusivity in subcellular milieu, various measurements of heterogeneous diffusivity have been made both across different model systems and via different modalities, as cited in our manuscript. (Garner et al. 2023) used single-particle tracking to determine over 100-fold variability in diffusivity within individual S. pombe cells. Single-molecule measurements in (Xiang et al. 2020) and (Śmigiel et al. 2022) reveal an order-of-magnitude variation in tracer diffusion in mammalian cells and multi-fold variation in E. coli cytoplasm respectively. Fluorescence correlation spectroscopy measurements in (Huang et al. 2022) have found a two-fold increase in short-range diffusion of protein-sized tracers in X. laevis extracts. We have also added a reference to a study that uses 3D single particle tracking in the cytosol of a multinucleate fungus, A. gossypii, to identify regions of low-diffusivity near nuclei and hyphal tips (McLaughlin et al. 2020). Many of these references deploy particle tracking and investigate how mesoscale-sized particles (i.e. tracers spanning biologically relevant size scales) are directly impacted by space-dependent diffusivity. Therefore, we base our model on not only space-dependent diffusivity being a well-recognized feature of the cellular interior, but also on these observations pertaining to mesoscale-sized particles’ motion along relevant timescales.

      These measurements are also relevant to the reviewer’s question about the strength of the effect, which depends directly on the variability in diffusivity: for ten- or a hundred-fold diffusivity variations, the effect would be expected to be significant. In case of using the Itô convention directly, the contrast in concentration gradient is, in fact, that of the diffusivity gradient.

      To conclude, the "diffusive lensing" effect presented here is not a deep physical discovery, but a well-known effect of sticking to the wrong stochastic convention.

      As detailed in the various responses above, we respectfully disagree with the notion that there exists a singular correct stochastic convention that is applicable for all cases of subcellular heterogeneous diffusion. Further, as detailed in (Volpe and Wehr 2016) and as detailed in the Appendix, it is possible to convert between conventions and that an isothermal-abiding stochastic differential equation may be suitably altered, by means of adding a drift term, to an Itô-abiding stochastic differential equation; therefore, one can observe diffusive lensing without discarding the isothermal convention if the latter were modified. Indeed, it is only the driftless (or canonical) isothermal convention that does not allow for diffusive lensing.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review:

      This manuscript by Yue et al. aims to understand the molecular mechanisms underlying the better reproductive outcomes of Tibetans at high altitude by characterizing the transcriptome and histology of full-term placenta of Tibetans and compare them to those Han Chinese at high elevations.

      The approach is innovative, and the data collected are valuable for testing hypotheses regarding the contribution of the placenta to better reproductive success of populations that adapted to hypoxia. The authors identified hundreds of differentially expressed genes (DEGs) between Tibetans and Han, including the EPAS1 gene that harbors the strongest signals of genetic adaptation. The authors also found that such differential expression is more prevalent and pronounced in the placentas of male fetuses than those of female fetuses, which is particularly interesting, as it echoes with the more severe reduction in birth weight of male neonates at high elevation observed by the same group of researchers (He et al., 2022).

      This revised manuscript addressed several concerns raised by reviewers in last round. However, we still find the evidence for natural selection on the identified DEGs--as a group--to be very weak, despite more convincing evidence on a few individual genes, such as EPAS1 and EGLN1.

      The authors first examined the overlap between DEGs and genes showing signals of positive selection in Tibetans and evaluated the significance of a larger overlap than expected with a permutation analysis. A minor issue related to this analysis is that the p-value is inflated, as the authors are counting permutation replicates with MORE genes in overlap than observed, yet the more appropriate way is counting replicates with EQUAL or MORE overlapping genes. Using the latter method of p-value calculation, the "sex-combined" and "female-only" DEGs will become non-significantly enriched in genes with evidence of selection, and the signal appears to solely come from male-specific DEGs. A thornier issue with this type of enrichment analysis is whether the condition on placental expression is sufficient, as other genomic or transcriptomic features (e.g., expression level, local sequence divergence level) may also confound the analysis.

      According to the suggested methods, we counted the replicates with equal or more overlapping genes than observed (≥4 for the “combined” set; ≥9 for the “male-only” set; ≥0 for the “female-only” set). We found that the overlaps between DEGs and TSNGs were significantly enriched only in the “male-only” set (p-value < 1e-4, counting 0 time from 10,000 permutations), but not in the “female-only” set (p-value = 1, counting 10,000 time from 10,000 permutations), or “combined” set (p-value = 0.0603, counting 603 time from 10,000 permutations) (see Table R1 below).

      We updated this information in the revised manuscript, including Results, Methods, and Figure S9.

      Author response table 1.

      Permutation analysis of the overlapped genes between DEGs and TSNGs.

      The authors next aimed to detect polygenic signals of adaptation of gene expression by applying the PolyGraph method to eQTLs of genes expressed in the placenta (Racimo et al 2018). This approach is ambitious but problematic, as the method is designed for testing evidence of selection on single polygenic traits. The expression levels of different genes should be considered as "different traits" with differential impacts on downstream phenotypic traits (such as birth weight). As a result, the eQTLs of different genes cannot be naively aggregated in the calculation of the polygenic score, unless the authors have a specific, oversimplified hypothesis that the expression increase of all genes with identified eQTL will improve pregnancy outcome and that they are equally important to downstream phenotypes. In general, PolyGraph method is inapplicable to eQTL data, especially those of different genes (but see Colbran et al 2023 Genetics for an example where the polygenic score is used for testing selection on the expression of individual genes).

      We would recommend removal of these analyses and focus on the discussion of individual genes with more compelling evidence of selection (e.g., EPAS1, EGLN1).

      According to the suggestion, we removed these analyses in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

      The authors make six chief points: 

      (1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

      (2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

      (3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

      (4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks. 

      (5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task" 

      (6) and further: "suggest the need to ascribe a separate function to these networks." 

      I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

      To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

      First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions. 

      - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

      Ray et al., NeuroImage 2012 <br /> Alegre et al., Experimental Brain Research 2013 <br /> Benis et al., NeuroImage 2014 <br /> Wessel et al., Movement Disorders 2016 <br /> Benis et al., Cortex 2016 <br /> Fischer et al., eLife 2017 <br /> Ghahremani et al., Brain and Language 2018 <br /> Chen et al., Neuron 2020 <br /> Mosher et al., Neuron 2021 <br /> Diesburg et al., eLife 2021 

      - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete: 

      Van den Wildenberg et al., JoCN 2006 <br /> Ray et al., Neuropsychologia 2009 <br /> Hershey et al., Brain 2010 <br /> Swann et al., JNeuro 2011 <br /> Mirabella et al., Cerebral Cortex 2012 <br /> Obeso et al., Exp. Brain Res. 2013 <br /> Georgiev et al., Exp Br Res 2016 <br /> Lofredi et al., Brain 2021 <br /> van den Wildenberg et al, Behav Brain Res 2021 <br /> Wessel et al., Current Biology 2022 

      - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.: 

      Eagle et al., Cerebral Cortex 2008 <br /> Schmidt et al., Nature Neuroscience 2013 <br /> Fife et al., eLife 2017 <br /> Anderson et al., Brain Res 2020 

      Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism. 

      Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

      Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021). 

      Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar. 

      Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials. 

      In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence. 

      We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.  

      We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

      A few other points: 

      - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

      Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

      - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal? 

      SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

      - Why was SSRT calculated using the outdated mean method? 

      We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

      - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error. 

      We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

      “The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

      - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are. 

      We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

      (1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

      We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

      (2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated. 

      Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

      We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

      “In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

      (3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

      We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

      We have also replaced text in the methods sections to reflect this (page 5):

      “For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

      Now reads:

      “For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

      Reviewer #2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed. 

      As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary. 

      I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method. 

      I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript: 

      We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

      (1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop. 

      Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

      “Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

      (2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly. 

      We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

      “In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

      (3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful. 

      This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

      “Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

      (4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

      We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

      “While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

      (5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify. 

      Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

      (6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement. 

      Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

      (7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove. 

      Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

      (8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful. 

      We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

      “Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

      (9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

      We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

      (10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity. 

      We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

      “In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      General response:

      We thank all the reviewers for their detailed reviews.

      All reviewers made a number of valuable comments, in particular by highlighting several points that would benefit from additional clarifications and discussion. We really appreciate the time and effort that went into the reviews. We have updated the paper to reflect the changes we have made in response to the reviewers' comments (largely by including more discussion regarding the model limitations and the effect of various modeling choices). We have also included several new supplementary figures (S7, S8, S9, S10) that provide further details of the model behavior, and show the effect of changing some of the terms in the cost. Below, we go through the individual comments, and highlight the places in which we have made changes to address the reviewers’ comments.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments :

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5) Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      Dasgupta and colleagues make a valuable contribution to the understanding how the guidance factor Sema7a promotes connections between mechanosensory hair cells and afferent neurons of the zebrafish lateral line system. The authors provide solid evidence that loss of Sema7a function results in fewer contacts between hair cells and afferents through comprehensive quantitative analysis. Additional work is needed to distinguish the effects of different isoforms of Sema7a to determine whether there are specific roles of secreted and membrane bound forms. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, the effect of loss of Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes. These issues weaken the claims made by the authors including the statement that they have identified dual roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively. 

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below). In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations. 

      Reviewer #3 (Public Review):

      The data reported here demonstrate that Sema7a defines the local behavior of growing axons in the developing zebrafish lateral line. The analysis is sophisticated and convincingly demonstrates effects on axon growth and synapse architecture. Collectively, the findings point to the idea that the diffusible form of sema7a may influence how axons grow within the neuromast and that the GPI-linked form of sema7a may subsequently impact how synapses form, though additional work is needed to strongly link each form to its' proposed effect on circuit assembly. 

      The revised manuscript is significantly improved. The authors comprehensively and appropriately addressed most of the reviewers' concerns. In particular, they added evidence that hair cells express both Sema7A isoforms, showed that membrane bound Sema7A does not have long range effects on guidance, demonstrated how axons behave close to ectopic Sema7A, and analyzed other features of the hair cells that revealed no strong phenotypes. The authors also softened the language in many, but not all places. Overall, I am satisfied with the study as a whole. 

      Reviewer #4 (Public Review):

      This study provides direct evidence showing that Sema7a plays a role in the axon growth during the formation of peripheral sensory circuits in the lateral-line system of zebrafish. This is a valuable finding because the molecules for axon growth in hair-cell sensory systems are not well understood. The majority of the experimental evidence is convincing, and the analysis is rigorous. The evidence supporting Sema7a's juxtracrine vs. secreted role and involvement in synapse formation in hair cells is less conclusive. The study will be of interest to cell, molecular and developmental biologists, and sensory neuroscientists. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In their revised manuscript, Dasgupta et al. have provided further experiments to address the role of Sema7a (sec and GPI-anchored) in regulating axon guidance in the lateral line system. Specifically, the inclusion of the heat shock controls and FM labeling to show hair cell mechanotransduction were crucial to interpretation of the results. However, there are still concerns about the specificity of the results. My primary concern is if the change in axon patterning is specifically due to loss of Sema7a in the mutant hair cells. These animals are morphologically very abnormal and, in the rebuttal, the authors state that hair cell number is reduced. This is not quantified in the manuscript and should be included. 

      Thank you for this suggestion. We have included the data in the manuscript in lines 137-139, in Figure 2—figure supplement 1B, and in the source data for Figure 2 and Figure 2-figure supplements.

      If there is not a function for Sema7a in hair cells themselves, why is the number reduced? 

      The sema7a-/- homozygous mutants are not viable and they die by 6 dpf. The loss of Sema7A protein produce other developmental defects including brain edema and a curved body axis. We believe a slight but not significant decrease in hair cell number may arise from a minute developmental delay in the morphogenesis of the neuromast. We have accordingly quantified our data at three distinct developmental stages-at 2 dpf, 3 dpf, and 4 dpf-and have incorporated them in the revised manuscript.

      Additionally, FM data should be quantified and presented in animals without a transgene in the same excitation/emission spectra for clearer interpretation of the staining.

      We have quantified the intensities of labeling with FM 4-64 styryl dye from the control and the sema7a-/- mutant larvae and incorporated the data in lines 139-146, in Figure 2—figure supplement 1D, and in source data for Figure 2 and Figure 2-figure supplements. We Kept the transgenes to concurrently show the arborization phenotype, hair cell morphology, and the FM 4-64 incorporation between the genotypes. 

      Rescue analysis using the myo6d promotor would allow the authors to ensure that the axon deficits can be rescued by putting Sema7a back into the sensory hair cells. Transient transgenesis could be useful for this approach and would not require the creation of a stable line. This could be done with both forms of Sema7a allowing the true assessment of whether or not the secreted and GPI-anchored form have disparate functions as claimed in lines 418424. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Other concerns:

      (1) The timeline of the heat shock experiment is confusing to me and, therefore, it makes me question the specificity of those results. Based on the speed of axon outgrowth and the time necessary for transcription and translation after heat shock induction of the transgene, it is unclear to me how the axon growth defects could occur in the timeline provided. Imaging two hours after the start of the heat shock is very rapid and speaks to either an indirect effect of the transgenesis on the axon growth or a leaky promotor/induction paradigm. It is possible I am just misunderstanding the set up but, from what I could gather, the imaging is being done 2 hrs after the start of the heat shock. This should be clarified. 

      The axons of the zebrafish posterior lateral line migrate relatively fast. The pioneering axons migrate at around 120 μm/hour (Sato et. al., 2010) and the follower axons migrate at almost 30-80 μm/hour (Sato et. al., 2010). The heat-shock promoter that we have utilized, hsp70l, is highly effective in inducing gene expression and subsequent protein formation within 30 to 60 mins. We believe an hour of heat shock and an hour of incubation post heat shock is sufficient to induce directed axon migration to a distance that spans from 27 μm to 140 μm. 

      We strongly believe that the directed arborization of the sensory axons towards the Sema7Asec source is not due to an indirect effect of transgenesis or leaky promoter induction, as in all 18 of the injected but not heat-shocked control larvae we did not observe ectopic Sema7Asec expression, and no aberrant projection was formed from the sensory arbor network. We highlight this observation in lines 297-299 and in Figure 4E.

      Sato et. al., 2010: Single-cell analysis of somatotopic map formation in the zebrafish lateral line system. Developmental Dynamics 239:2058–2065, 2010.

      Similarly, it would help to clarify if t(0) in the figure is the onset of the heat shock or onset of imaging two hours after the heat shock is started. 

      The t=0 hour in the Figure 4I denotes the onset of imaging two hours after the heat shock began. We have clarified this in the manuscript in lines 1155-1156.

      (2) In the rebuttal, the line numbers cited do not match up with the appropriate text, I believe.

      We have corrected this and updated the manuscript.

      (3) Some of the supplemental figures are not mentioned in the text, or I could not find them. For example: Figure 1 supplement 2J. 

      Thank you for pointing this. We have corrected the manuscript, and the new information is added in line 114.  

      (4) Table 1 statistics: were these adjusted for multiple comparisons using a bonferroni correction or something similar? This is necessary for statistical significance to be meaningful. 

      We did not adjust the p-values for multiple comparisons because the values correspond to only three or four statistical tests per experiment, strongly indicating the unlikelihood of erroneous significance due solely to multiple tests.

      (5) Figure 1I and 1-S3 - The legend states a positive correlation between axonal signal and sema7A signal. Correlations are 0.5, 0.6, and 0.4 (2,3, 4dpf). This is not a convincing positive correlation. At best this is no to a very weak positive correlation. 

      In lines 122-126 we mention that the basal association of the sensory arbors shows a positive correlation with Sema7A accumulation. We never emphasize on the strength of the correlation. However, a consistent positive correlation at three different developmental stages suggests that progressive Sema7A accumulation at the base of the hair cells may guide the sensory arbors to increasingly associate themselves with the hair cells.    

      Reviewer #2 (Recommendations For The Authors):

      I am a bit disappointed that the authors elected not to experimentally address the issue raised by all reviewers: whether the secreted or membrane bound isoform is active in hair cells. They rather decided to change their interpretation in the text. It is fine, given the eLife review structure. However, that would make the manuscript much stronger. Other issues were adequately addressed through textual changes as well. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, I am satisfied with the study as a whole and just have a few minor comments that remain to be addressed. 

      (1) Although the authors say that they added appropriate no plasmid/heatshock-only and plasmid-only/no heatshock controls, these results need to be presented more clearly, as they are separated in the paper and only one was quantified (i.e. 100% of embryos showed no defect). Please just make it clear that no defects were observed in either control for either experiment (both secreted and membrane bound ectopic expression). 

      We have clearly stated this information in lines 297-299 and 343-345.

      (2) Please add a compass to Fig. 1A to indicate the orientation of the neuromast. It would also be helpful to add labels for developmental ages to all of the figures, rather than making the reader look it up in the legend. 

      We have updated the Figure 1A and the corresponding figure legend in lines 882883 . We have denoted the larval age in the figure legends to keep the individual images uncluttered.  

      (3) For the RT-PCR experiments in Figure 1, no negative control was included to show that supporting cell or neuronal genes are not detected in the purified hair cells and v.v. that neither isoform is detected in supporting cells or neurons. I ask only because there is a lot of immune-signal outside of the hair cells and I am curious whether that is secreted or might come from other cell types. For neurons and supporting cells, simply demonstrating absence of Sema7a overall would suffice. 

      We have utilized the transgenic line Tg(myo6b:actb1-EGFP) that expresses the fluorophore GFP specifically in the hair cells of the neuromast. Unfortunately, we do not possess a transgenic line that reliably and specifically labels the support cells in the neuromast. Hence, in our sorting experiment the GFP-negative cells that are collected from the trunk segments of the larvae contain all the non-hair cells including epidermal cells, neuronal cells, and immune cells etc. Such a mixture of varied cellular identity may not serve as a reliable negative control. 

      In Figure 7, we have plotted the normalized expression values of the sema7a gene in the neuromast. The plot clearly depicts that the source of Sema7A is the young and the mature hair cells, not the support cells. We further confirm this observation by

      immunostaining where the Sema7A signal is highly restricted to the hair cells and not in any other cell in the neuromast (Figure 1E). Immunostaining further demonstrates that the lateral line sensory arbors also do not produce the Sema7A protein (Figure 1H; Video 1).

      We agree with the reviewer that there are diverse immune cells, including macrophages in and around the neuromast. These macrophages are dynamic and possess highly ramified structure (Denans et. al., 2022). In all our Sema7A immunostainings, we never observed structures that resemble macrophages. Albeit we cannot confirm that Sema7A is not expressed in a distant immune cell, but we highly doubt that signal coming from immune cells is impacting hair cell innervation by the sensory arbors during homeostatic development.

      Denans et. al., 2022: Nature Communications volume 13, Article number: 5356 (2022).

      (4) In Figure 1, Supplement 4, I do not see the immunogen labeled in blue. 

      We have corrected the figure legend. The immunogenic region of the Sema7A protein is now clearly denoted in the figure legend of Figure 1—figure supplement 4.

      (5) In Figure 2, please add a control image as requested, as that enables direct comparison. There is ample room in the figure. 

      We have updated the Figure 2 and made the suggested change.

      (6) In Figure 2, Supplement 1, the FM4-64 data are not presented in a quantified fashion. Please report at least how many embryos showed reliable uptake and preferably how many hair cells per embryo showed reliable uptake. 

      We have quantified the FM 4-64 intensities in control and sema7a-/- mutant larvae. The new data is added to the manuscript in lines 142-146, 577-579 , and in Figure 2—figure supplement 1D.

      (7) In Figure 3, there seems to be a typo in the figure legend: "mutants in the same larvae" does not make sense to me. 

      We have corrected the error. The modified statement is represented in lines 10671068.

      (8) The text should refer more explicitly to the statistical tests reported in Table 1, i.e. as the results are presented. 

      In lines 1105 and 1109, we clearly state the statistical tests that were performed.

      (9) In Figure 6, Supplement 1, please show the raw data points not just the bar graphs

      We have updated the Figure 6—figure supplement 1.

      (10) Minor point: the authors state that they addressed the distance over which secreted Sema7A may act, but this was not evident to me in the text. Please make this finding clearer.

      We have clarified this information in lines 310-311.

      (11) Finally, the discussion contains a statement that is not supported by the data: "We have discovered dual modes of Sema7A function in vivo." They have discovered evidence that there are two isoforms, that loss of both disrupts connectivity, and that overexpression of only the secreted form can elicit growth from a distance. However, there is no direct evidence that the membrane-bound form is responsible for local effects. It is formally possible still that the phenotypes are a result of dual roles for the secreted form. It is clear that another manuscript is forthcoming that will expand on the role of the transmembrane form, but for this manuscript, the authors should make firm conclusions only about the data presented herein.

      Thank you for this suggestion. We have modified the manuscript in lines 425-434.

      Reviewer #4 (Recommendations For The Authors):

      The authors have made significant changes to the manuscript based on the comments of the reviewers. It is now suitable for publication.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no more experiment to ask but the following errors should be corrected prior.

      (1) L. 183-198: Figure 3 panels were erroneously referred in several places.

      This has been corrected.

      (2) L.182-183: description of active/total cell numbers in main text does not match numbers in Figure 3B

      This has been corrected.

      (3) L.185-187: Figure 3C indicates significant changes of rheobase only between DMI+6OHDA versus 6-OHDA group. Statistical comparison between sham and DMI+6-OHDA was not provided, which may change the interpretation of the data in Figure 3B, C: "...these findings suggest that the 6-OHDA induced lesion of midbrain dopaminergic neurons evoked the increased firing of DRN5-HT neurons" (L.185-187).

      We thank the reviewer for highlighting this point. Indeed, a Kruskal-Wallis test comparing all three groups revealed a significantly lower rheobase in DMI + 6-OHDA mice compared to Sham while the 6-OHDA injected group was not affected. Therefore, the increased firing of DRN5-HT neurons recorded in 6-OHDA injected mice pretreated with DMI also critically involves the noradrenergic system. This is now included in the revised results section of the manuscript (lines 190-197).

      (4) L. 188: The description of "While the excitability of DRN5-HT neurons was not affected in 6-OHDA mice..." does not match the clearly increased cellular excitability shown in Figure 3G-I.

      This has been corrected and we are now referring more specifically to the rheobase, which is not affected in 6-OHDA mice.

      (5) Mann-Whitney tests were inappropriately used for statistics in Figures 3-6: Multiple comparisons (>=3 groups) should be performed one-way ANOVA or the Kruskal-Wallis test for nonparametric data.

      We thank the reviewer for the comment. We now applied the one-way ANOVA/KruskalWallis tests and the text has been modified accordingly.

      (6) It seems that the data points in some panels of Figure 4C represented a cell, but others were averaged within a mouse (Figure 4D). This needs to be clarified or corrected.

      None of the data in Figure 4 was averaged within a mouse. In the the type of chosen graph (aligned dot plot) the equal data are overlapped.

      Reviewer #2 (Recommendations For The Authors):

      The authors' revised manuscript has addressed most of my concerns. However, I'm not convinced by the authors' claim regarding Figure 5B. It would be great if the authors at least discuss in their manuscript why the DMI pretreatment group alone, not the 6OHDA group, significantly lowers the firing rate of DRN (DA) and increases the Erest of DRN (DA), compared to the sham-lesion group. These statistically significant data are not explained at all in the revised manuscript (This effect can be explained by the neuroprotection of NA-neurons from 6-OHDA toxicity?).

      We thank the reviewer for this comment. Since using a one-way ANOVA or a KruskalWallis test for comparing the three groups (as suggested by reviewer 1), the changes previously shown in Figure 5B are not significant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ are inconclusive. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors and reviewers for their valuable feedback and constructive comments. We have carefully considered each point raised by the reviewers and made the necessary revisions to the manuscript. Regarding the relationships between global and local BM processing, the accumulated evidence from previous studies has converged on the dissociation of the two BM components, e.g., while global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). Nevertheless, we concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer claimed the dissociation (including the title). Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating the impairments of biological motion perception in individuals with ADHD in comparison with neurotypical controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated the impairments of local and global (holistic) biological motion perception, the diagnosis status, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention / impulsivity). As well local as global biological motion perception is impaired in ADHD individuals. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not in controls. A path analysis in the ADHD group suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature, and adds potentially also new behavioral markers for this clinical group. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thanks for this positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper. Specifically, the hypothesis that the perception of human social interaction is critically based on a local mechanism for the detection of asymmetry in foot trajectories of walkers (this is what 'BL-local' really measures), or on the detection of live agents in cluttered scenes seems not very plausible.

      Thanks for these comments. We agree that the relationship between genetic factors and BM perception remains to be further examined, as we did not test the genetic influences in this study. We have deleted relavant discussion about genetics. Based on our results, we discuss the possible mechanisms behind the relationship between local BM processing and social interaction in the revised manuscript as follows:

      “As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs. Further empirical studies are required to confirm these hypotheses.” (lines 417 - 428)

      Based on my last comments, now the discussion has been changed in a way that tries to justify the speculative claims by citing a lot of other speculative papers, which does not really address the problem. For example, the fact that chicks walk towards biological motion stimuli is interesting. To derive that this verifies a fundamental mechanism in human biological motion processing is extremely questionable, given that birds do not even have a cortex. Taking the argumentation of the authors serious, one would have to assume that the 'Local BM' mechanism is probably located in the mesencephalon in humans, and then would have to interact in some way with social perception differences of ADHD children. To me all this seems to make very strong (over-)claims. I suggest providing a much more modest interpretation of the interesting experimental result, based on what has been really experimentally shown by the authors and closely related other data, rather than providing lots of far-reaching speculations.

      In the same direction, in my view, go claims like 'local BM is an intrinsic trait' (L. 448) , which is not only imprecise (maybe better 'mechanisms of processing of local BM cues') but also rather questionable. Likely, this' local processing of BM' is a lower level mechanisms, located probably in early and mid-levels of the visual cortex, with a possible influence of lower structures. It seems not really plausible that this is related to a classical trait variables in the sense of psychology, like personality, as seems to be suggested here. Also here I suggest a much more moderate and less speculative interpretation of the results.

      We thank the reviewer for pointing out these issues. According to these comments, we have carefully revised the discussion to avoid strong (over-) claims. We have deleted the example of chicks, but substituted with more empirical studies to explain our results. We agree that the Local BM mechanism is probably located in subcortical regions in humans, which were reported by some MRI studies (Chang et al., 2018; Hirai and Senju, 2020; Loula et al., 2005). We have added some evidence that atypical local BM processing may decrease visual inputs related to social information as follows:

      “According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 421 - 427)

      We have also deleted the clarims of 'local BM is an intrinsic trait' (originally L. 448) and related discussion as it was not conclusive based on the current study.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the reviewer’s positive feedback very much.

      Weaknesses:

      The manuscript has greatly improved in clarity and methodological considerations in response to the review. There are only a few minor points which deserve the authors' attention:

      When outlining the moviation for the current study, results from studies in ADHD and ASD are used too interchangeably. The authors use a lack of evidence for contributing (psychological/developmental) factors on BM processing in ASD to motivate the present study and refer to evidence for differences between typical and non-typical BM processing using studies in both ASD and ADHD. While there are certainly overlapping features between the two conditions/neurotypes, they are not to be considered identical and may have distinct etiologies, therefore the distinction between the two should be made clearer.

      We thank the reviewer for pointing out this issue. We have removed some unnecessary citations about ASD and referred to studies about social cognition in ADHD to elaborate the motivation of this study:

      “Further exploration of a diverse range of social cognitions (e.g., biological motion perception) can provide a fresh perspective on the impaired social function observed in ADHD. Moreover, recent studies have indicated that the social cognition in ADHD may vary depending on different factors at the cognitive, pathological, or developmental levels, such as general cognitive impairment5, symptoms severity8, or age5. Nevertheless, understanding how these factors relate to social cognitive dysfunction of in ADHD is still in its infancy. Bridging this gap is crucial as it can help depict the developmental trajectory of social cognition and identify effective interventions for impaired social interaction in individuals with ADHD.” (lines 53 - 62)

      In the first/main analysis, is unclear to me why in the revised manuscript the authors changed the statistical method from ANOVA/ANCOVA to independent samples t-tests (unless the latter were only used for post-hoc comparisons, then this needs to be stated). Furthermore, although p-values look robust, for this analysis too it should be indicated whether and how multiple comparison problems were accounted for.

      Thanks for the reviewer’s comments. According to the suggestions from reviewer #3, it may be inapposite to regard gender as a covariate in ANOVA, which may violate the assumptions of ANCOVA. To ensure that gender does not influence the results, firstly, we separated boys and girls on the plots with different coloured individual data points, and there are no signs of a gender effect in their TD group. Secondly, we use t-tests to examine the difference between TD and ADHD groups. Finally, we conducted a subsampling analysis with balanced data, and the results remained consistent.

      In part 1 of the results, we aimed to compare the task accuracies between the TD and ADHD groups in three independent tasks, which assess the participants’ abilities to process three types of BM cues. We assumed that individuals with ADHD show poorer performance in three tasks compared to TD individuals. With regard to that, we consider that multiple comparisons may not be necessary.

      Reviewer #3 (Public Review):

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate the reviewer’s positive assessment of this work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test the authors' claims. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      Thanks for this comment. We agree with the reviewer that the relationship between local and global processing with social communication and age needs more expirical work. Based on our results, there are only possible dissociable roles of local and global BM processing. The accumulated evidence from previous studies has converged on this dissociation, e.g., whild global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). We concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer emphasized the dissociation. Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD. Future studies with larger sample sizes are needed to confirm this disociable relationship.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that should still be made more tentatively. They assume that local processing is specifically genetically whereas global processing is a product of experience. These data in newborn chicks are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      We appreciate the reviewer’s suggestion. We agree that the relationship between genetic factors and BM perception remains to be further examined as we didn’t perform any genetic analysis in the current study. Some speculative papers have been removed, so do the statement about newborn chicks given the controversial and confounded results. We have toned down our claims and povided a moderate interpretation of the results:

      “Sensitivity to local BM cues emerges early in life54,55 and involves rapid processing in the subcortical regions16,56-58. As a basic pre-attentive feature23, local BM cues can guide visual attention spontaneously59,60. In contrary, the ability to process global BM cues is related to slow cortical BM processing and is influenced by many factors such as attention25,26 and visual experience21,51. As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 413 - 427)

      “Few developmental studies have been conducted on local BM processing. The ability to process local BM cues remained stable and did not exhibit a learning trend21,25. A reasonable interpretation may be that local BM processing is a low-level mechanism, probably performed by the primary visual cortex and subcortical regions such as the superior colliculus, pulvinar, and ventral lateral nucleus14,56,61.” (lines 441- 446)

      Readability. The manuscript needs very careful proofreading and correction for grammar. There are grammatical errors throughout.

      Thank the reviewer for this feedback. We have performed thorough proofreading and corrected grammatical errors throughout the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the authors for their revisions that address several of the minor points that I raised in my last review. A number of requests are still not sufficiently answered:

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors genderetc. time appropriate beta_i values. This formulas should be corrected or one just says that a GLM was run with the predictors gender

      The same criticism applies to these other models that follow.

      This was corrected.

      However, the corrected text remains sloppy: example: 'BM-locaL = ...' What exacty is 'BM-Local' the accuracy? etc. Here a precise notation shoudl be given that clearly names which variables are used here as predictors and target variables.

      We appreciate the reviewer’s suggestion. We clarified which variables are used in our model and gived them precise notations:

      “Three linear models were built to investigate the contributing factors: (a) ACClocal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, (b) ACCglobal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, and (c) ACCgeneral = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention + β5 * ACClocal + β6 * ACCglobal. ACClocal, ACCglobal and ACCgeneral refer to the response accuracies of the three tasks in the ADHD group, and QbInattention is the standardised score for sustained attention function.” (lines 337 - 343)

      All these models assume linearity of the combination of the predictors. was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      This answer is insufficient and not convincing. Because a variable Y depends linearly on predictor A and B in some other study, this does not imply that is is also linear in predictor C, or does not show interactions with such predictors in the present study.

      What is needed here is the testing of models with interaction terms and verifying that such models are not better predictors. If authors do not want to do this, they need at least to clearly point out that they made the strong assumption of linearity of their model, which might be wrong and thus be a substantial limitation of their analysis.

      Thanks for the suggestion. We tried to compare each possible mode with and without relative interactions. The results showed that the change of Coefficient of Determination (R-squared, R2) between the two models was not statistically significant.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the ADHD group. Does the same observation also apply to the controls?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Was such a path analysis also done for the TD subjects or not? If yes, was then also predicted that the variable BM-Global largely and directedly influences the variable BM-General? (The answer refers to the general discussion section, where no such analysis is presented, as far as I understand.)

      Thank you for your comment. We also conduct a path analysis similar to that in the ADHD group. There is no statistically significant mediator effect in the TD group. Please see Figure S3 for complete statistics.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data analyzed during the study is available at https://osf.io/37p5s/.

      (2) Lines 119-115: The differences observed in ADHD participants in the studies referenced here were relative to what group? The last sentence here also refers to two groups, and it is difficult to gather which specific groups are meant, also because the two references relate to both ADHD and ASD samples. Please clarify.

      The suggestion is well taken. We have clarified the expressions accordingly:

      “Specifically, compared with the typically developing (TD) group, children with ADHD showed reduced activity of motion-sensitive components (N200) while watching biological and scrambled motions, although no behavioural differences were observed. Another study found that children with ADHD performed worse in BM detection with moderate noise ratios than the TD group32.” (lines 100 - 105)

      (3) Line 116: I'm not sure what is meant by 'despite initial indications' - please briefly specify/summarise here why the investigation into BM processing in ADHD is warranted.

      Thank the reviewer for pointing out this issue. We rephrase this part and briefly specify “why the investigation into BM processing in ADHD is warranted”:

      “Despite initial findings about atypical BM perception in ADHD, previous studies on ADHD treated BM perception as a single entity, which may have led to misleading or inconsistent findings28. Hence, it is essential to deconstruct BM processing into multiple components and motion features.” (lines 108 -111)

      (4) Lines 290-293: Please complete the sentence.

      Thank the reviewer for pointing out this issue. Th sentence has been completed:

      “For Task 2 and 3, where children were asked to detect the presence or discriminate the facing direction of the target walker, TD group have higher accuracies than the ADHD group (Task 2 - TD: 0.70 ± 0.12, ADHD: 0.59 ± 0.12, t73 = 3.677, p < 0.001, Cohen's d = 0.861; Task 3 - TD: 0.79 ± 0.12, ADHD: 0.63 ± 0.17, t73 = 4.702, p < 0.001, Cohen's d = 1.100).” (lines 284 - 288)

      Reviewer #3 (Recommendations For The Authors):

      (1) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors need to reword throughout to reflect that the tests of differences between these crucial correlations did not present a clear picture.

      We have reworded throughout the paper to reflect the inconclusiveness with regard to the relationship between local and global processing with social communication based on this study only. Future studies with larger sample sizes are needed to confirm this conclusion. The mechanism for this dissociable relationship should be validated by more psychologial tests in the future studies.

      (2) I would again tone down the discussion of genetic specification of local processing, given it is highly controversial.

      We thank the reviewer for pointing out the issue. We agree the point about the genetic specification of local processing remains controversial. The interpretation of results about local BM processing has been rephrased. Please refer to our response to the point #2 mentioned.

      (3) The manuscript needs very careful proofreading and grammatical correction throughout.

      Thanks for the suggestion to check the grammar. We have carefully proofread the manuscript to correct grammatical errors

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Following synaptic vesicle fusion events at release sites, vesicle remnants will need to be cleared in order to allow new rounds of vesicle docking and fusion. This fundamental study of Mahapatra and Takahashi examines the role of release site clearance in synaptic transmission during repetitive activity in two types of central synapses, the giant calyx of Held and hippocampal CA1 synapses. The study uses pharmacological approaches to interfere with release site clearance by blocking membrane retrieval (endocytosis). They compare the effects on short-term plasticity with those obtained by pharmacologically inhibiting scaffold protein activity. The data presented make a compelling case for fast endocytosis as necessary for rapid site clearance and vesicle recruitment to active zones. The data reveal an unexpected, fast role for local site clearance in counteracting synaptic depression.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated and the authors have tried several reagents to verify the overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee at al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We dissected the latrunculin effect further by referring to the related literature within the scope of this study in the revised Discussion section (last paragraph).

      Reviewer #3 (Public Review):

      The manuscript by Mahapatra and Takahashi addresses the role of presynaptic release site clearance during sustained synaptic activity. The authors characterize the effects of pharmacologically interfering with SV endocytosis (pre-incubation with Dynasore or Pitstop-2) on synaptic short-term plasticity (STP) at two different CNS synapses (calyx of Held synapses and hippocampal SC to CA1 synapses) using patch-clamp recordings in acute slices under experimental conditions designed to closely mimic a physiological situation (37{degree sign}C and 1.3 mM external [Ca2+]). Endocytosis blocker-induced changes in STP and in the recovery from short-term depression (STD) are compared to those seen after pharmacologically inhibiting actin filament assembly (pre-incubation with Latrunculin-B or the selective Cdc42 GTPase inhibitor ML-141). Presynaptic capacitance (Cm) recordings in calyx terminals were used to establish the effects of the pharmacological maneuvers on SV endocytosis.

      Latrunculin-B and ML-141 neither affect SV endocytosis (assayed by Cm recordings) nor EPSC recovery following conditioning trains, but strongly enhances STD at calyx synapses. No changes in STP were observed at Latrunculin-B- or ML-141-treated SC to CA1 synapses.

      Dynasore and Pitstop-2 slow down endocytosis, limit the total amount of exocytosis in response to long stimuli, enhance STD in response to 100 Hz stimulation, but profoundly accelerate EPSC recovery following conditioning 100 Hz trains at calyx synapses. At SC to CA1 synapses, Dynasore and Pitstop-2 reduce the extend of facilitation and lower relative steady-state EPSCs suggesting a change in the facilitation-depression balance in favor of the latter.

      The authors use state-of-the art techniques and their data, which is clearly presented, leads to authors to conclude that endocytosis is universally important for clearance of release sites while the importance of scaffold protein-mediated site clearance is limited to 'fast synapses'.

      Unfortunately, and perhaps not completely unexpected in view of the pharmacological tools chosen, there are several observations which remain difficult to understand:

      (1) Blocking site clearance affects release sites that have previously been used, i.e. sites at which SV fusion has occurred and which therefore need to be cleared. Calyces use at most 20% of all release sites during a single AP, likely fewer at 1.3 mM external [Ca2+]. Even if all those 20% of release sites become completely unavailable due to a block of release site clearance, the 2nd EPSC in a train should not be reduced by >20% because ~80% of the sites cannot be affected. However, ~50% EPSC reduction was observed (Fig. 2B1, lower right panel) raising the possibility that Dynasore does more than specifically interfering with SVs endocytosis (and possibly Pitstop as well). Non-specific effects are also suggested by the observed two-fold increase in initial EPSC size in SC to CA1 synapses after Dynasore pre-incubation.

      This study compares different experimental conditions to conclude the physiological role of endocytosis on rapid neurotransmission at the large calyceal synapse in mice. A related study at the Drosophila neuromuscular junction (Kawasaki et al., Nat. Neuroscience 2000) reported similar findings in comparable experimental settings (physiological conditions and acute block of endocytosis).

      (2) More severe depression was observed at calyx synapses after blocking endocytosis which the authors attribute to a presynaptic mechanism affecting pool replenishment. When probing EPSC recovery after conditioning 100 Hz trains, a speed up was observed mediated by an "unknown mechanism" which is "masked in 2 mM [Ca2+]". These two observations, deeper synaptic depression during 100 Hz but faster recovery from depression following 100 Hz, are difficult to align and no attempt was made to find an explanation.

      By varying temperature (PT vs RT), calcium concentration (1.3 mM vs 2.0 mM), and stimulation frequency (10, 100, and 200 Hz; some data are not shown), the effect of endocytosis block on EPSC STD and recovery from STD kinetics at the post-hearing calyx were compared in these settings: (PT, 1.3 mM [Ca2+]), (PT, 2.0 mM Ca2+), and (RT, 2.0 mM [Ca2+]), to dissect their respective role.

      (3) To reconcile previous data reporting a block of Ca2+-dependent recovery (CDR) by Dynasore or Latrunculin (measured at 2 mM external [Ca2+]) with the data presented here (using 1.3 mM external [Ca2+]) reporting no effect or a speed up of recovery from depression, the authors postulate that "CDR may operate only when excessive Ca2+ enters during massive presynaptic activation" (page 10 line 244). While that is possible, such explanation ignores plenty of calyx studies demonstrating fiber stimulation-induced CDR and elucidating molecular pathways mediating fiber stimulation-induced CDR, and it also completely dismisses the strong change in recovery time course after 10 Hz conditioning (single exponential) as compared to 100 Hz conditioning (double exponential with a pronounced fast component).

      Strong presynaptic stimuli such as those illustrated in Figs. 1B,C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Documentation of the corresponding conductance traces is therefore advisable for such massive Cm jumps and merely mentioning that the first 450 ms after stimulation were skipped during analysis or referring to previous publications showing conductance traces is insufficient.

      All bar graphs in Figures 1 through 6 and Figures S3 through S6 compare three or even four (Fig. 5C) conditions, i.e. one control and at least two treatment data sets. It appears as if repeated t-tests were used to run multiple two-group comparisons (i.e. using the same control data twice for two different comparisons). Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      We updated the statistical analysis of all data using one-way ANOVA and t-test with BonferroniHolm method of p level correction and rectified one analysis in Fig 1 and 3, all major conclusions are unchanged.

      Finally, the terminology of contrasting "fast-signaling" (calyx synapses) and "slow-plastic" (SC synapses) synapses seems to imply that calyx synapses lack plasticity, as does the wording "conventional bouton-type synapses involved in synaptic plasticity" (page 11, line 251). I assume, the authors primarily refer to the maximum frequencies these two synapse types typically transmit (fast-signaling vs slow-signaling)?

      Properties of these two synapses described explicitly in updated text and they are renamed as fast and slow synapes.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be done sometimes here.

      In this revision, we described them more explicitly.

      The data presented in Fig. S6 are detached from the rest of the manuscript, not relevant and should be removed. page 4 line 95 "... to ensure sufficient Ca2+ currents to induce exo-endocytosis." ICa is large enough to induce exocytosis also at 1.3 mM Ca2+. Please clarify.

      We updated the relevant section.

      page 5, line 108 "... this slow endocytosis showed a strongly prolonged time course without accompanied by the change of Cm or presynaptic Ca2+ currents" Please fix.

      Fixed.

      page 5, line 121 "Thus, at calyces of Held, bath-application of Dynasore or Pitstop-2 can block both fast and slow endocytosis without perturbing presynaptic intracellular milieu." Bath-application never perturbs the intracellular milieu. Please clarify.

      Rephrased.

      page 6 line 128 "... physiological aCSF" is a misnomer (= physiological artificial CSF). Please fix.

      In the introduction section, it is clearly described.

      page 11, line 252 "... from hippocampal SC-CA1 pyramidal neurons" There are no "SC-CA1 pyramidal neurons". Please fix.

      Fixed.

      page 12, line 285 "In acute slices optimized to physiological conditions" The conditions are optimized, not the slices. Please fix.

      Fixed.

      page 14, line 323 same as above

      Fixed.

      page 14, line 330 LTP at SC-CA1 synapses is postsynaptic. Please clarify.

      Rephrased

      page 16, line 381 "had a series resistance of 3-4 MOhm" versus

      page 17, line 408 "The patch pipettes had a series resistance of 5-15 MOhm (less than 10 MOhm in most cells)" 3-4 is perhaps pipette resistance while 5-15 is perhaps series resistance? Please clarify.

      Fixed.

      page 17, line 398 "Cm traces were averaged at every 10 ms (for 10 Hz train stimulation) or 20 ms (for 5 ms single or 1 Hz train stimulation)." Do you mean to say that Cm traces were smoothed with a moving average using a window size of 10 or 20 ms duration? Please clarify.

      Rephrased to clarify better.

      page 18, "All values are given as mean {plus minus} SEM and significance of difference was evaluated by Student's unpaired t-test, unless otherwise noted." Please check. You cannot simply use repeated t-tests for multiple comparisons. Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      All statistical analysis are updated using one-way ANOVA and t-test, with Bonferroni-Holm method of p level correction and one analysis is rectified in Fig 1 and 3, with no change in major conclusions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      The authors aim to test the sensory recruitment theory of visual memory, which assumes that visual sensory areas are recruited for working memory, and that these sensory areas represent visual memories in a similar fashion to how perceptual inputs are represented. To test the overlap between working memory (WM) and perception, the authors use coarse stimulus (aperture) biases that are known to account for (some) orientation decoding in the visual cortex (i.e., stimulus energy is higher for parts of an image where a grating orientation is perpendicular to an aperture edge, and stimulus energy drives decoding). Specifically, the authors show gratings (with a given "carrier" orientation) behind two different apertures: one is a radial modulator (with maximal energy aligned with the carrier orientation) and the other an angular modulator (with maximal energy orthogonal to the carrier orientation). When the subject detects contrast changes in these stimuli (the perceptual task), orientation decoding only works when training and testing within each modulator, but not across modulators, showing the impact of stimulus energy on decoding performance. Instead, when subjects remember the orientation over a 12s delay, orientation decoding works irrespective of the modulator used. The authors conclude that representations during WM are therefore not "sensory-like", given that they are immune to aperture biases. This invalidates the sensory recruitment hypothesis, or at least the part assuming that when sensory areas are recruited during WM, they are recruited in a manner that resembles how these areas are used during perception.

      Strengths:

      Duan and Curtis very convincingly show that aperture effects that are present during perception, do not appear to be present during the working memory delay. Especially when the debate about "why can we decode orientations from human visual cortex" was in full swing, many may have quietly assumed this to be true (e.g., "the memory delay has no stimuli, and ergo no stimulus aperture effects"), but it is definitely not self-evident and nobody ever thought to test it directly until now. In addition to the clear absence of aperture effects during the delay, Duan and Curtis also show that when stimulus energy aligns with the carrier orientation, cross-generalization between perception and memory does work (which could explain why perception-to-memory cross-decoding also works). All in all, this is a clever manipulation, and I'm glad someone did it, and did it well.

      Weaknesses:

      There seems to be a major possible confound that prohibits strong conclusions about "abstractions" into "line-like" representation, which is spatial attention. What if subjects simply attend the endpoints of the carrier grating, or attend to the edge of the screen where the carrier orientation "intersects" in order to do the task? This may also result in reconstructions that have higher bold at areas close to the stimulus/screen edges along the carrier orientation. The question then would be if this is truly an "abstracted representation", or if subjects are merely using spatial attention to do the task.

      Alternatively (and this reaches back to the "fine vs coarse" debate), another argument could be that during memory, what we are decoding is indeed fine-scale inhomogenous sampling of orientation preferences across many voxels. This is clearly not the most convincing argument, as the spatial reconstructions (e.g., Figure 3A and C) show higher BOLD for voxels with receptive fields that are aligned to the remembered orientation (which is in itself a form of coarse-scale bias), but could still play a role.

      To conclude that the spatial reconstruction from the data indeed comes from a line-like representation, you'd need to generate modeled reconstructions of all possible stimuli and representations. Yes, Figure 4 shows that line results in a modeled spatial map that resembles the WM data, but many other stimuli might too, and some may better match the data. For example, the alternative hypothesis (attention to grating endpoints) may very well lead to a very comparable model output to the one from a line. However testing this would not suffice, as there may be an inherent inverse problem (with multiple stimuli that can lead to the same visual field model).

      The main conclusion, and title of the paper, that visual working memories are abstractions of percepts, is therefore not supported. Subjects could be using spatial attention, for example. Furthermore, even if it is true that gratings are abstracted into lines, this form of abstraction would not generalize to any non-spatial feature (e.g., color cannot become a line, contrast cannot become a line, etc.), which means it has limited explanatory power.

      We thank the reviewer for bringing up these excellent questions.

      First, to test the alternative hypothesis of spatial attention, we fed a dot image into the image-computable model. We placed the dot where we suspect one might place their spatial attention, namely, at the edge of the stimulus that is tangent to the orientation of the grating. We generated the model response for three orientations and their combination by rotating and averaging. From Author response image 1 below, one can see that this model does not match the line-like representation we reported. Nonetheless, we would like to avoid making the argument that attention does not play a role. We strongly suspect that if one was attending to multiple places along a path that makes up a line, it would produce the results we observed. But there begins a circularity in the logic, where one cannot distinguish between attention to a line-like representation and a line of attention being the line-like representation.

      Author response image 1.

      Reconstruction maps for the dot image at the edge of 15°, 75°, 135°, and the combined across three orientation conditions.

      Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding results we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.

      Finally, we agree with the reviewer that there is much more work to be done in this area. Our working hypothesis, that WM representations are abstractions of percepts, is admittedly based on Occam's razor and an appeal to efficient coding principles. We also agree that these results may not generalize to all forms of WM (eg, color). As always, there is a tradeoff between interpretability (visual spatial formats in retinotopically organized maps) and generalizability. Frankly, we have no idea how one might be able to test these ideas when subjects might be using the most common type of memory reformatting - linguistic representations, which are incredibly efficient.

      Additional context:

      The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      Again, this is an excellent question. We used a separate perceptual task instead of the stimulus epoch as control mainly for two reasons. First, we used a control task in which participants had to process the contrast, not orientation, of the grating because we were concerned that participants would reformat the grating into a line-like representation to make the judgments. To avoid this, we used a task similar to the one used when previous researchers first found the stimulus vignetting effect (Roth et al., 2018). Again, our main goal was to try to focus on the bottom-up visual features. Second, because of the sluggishness of the BOLD response, combined with our task design (ie, memory delay always followed the target stimulus), we cannot disentangle the visual and memory responses that co-exist at this epoch. Any result could be misleading.

      What's also interesting is what happens in the passive perceptual condition, and the fact that spatial reconstructions for areas beyond V1 and V2 (i.e., V3, V3AB, and IPS0-1) align with (implied) grating endpoints, even when an angular modulator is used (Figure 3C). Are these areas also "abstracting" the stimulus (in a line-like format)?

      We agree these findings are interesting and replicate what we found in our previous paper (Kwak & Curtis, Neuron, 2022). We believe that these results do imply that these areas indeed store a reformatted line-like WM representation that is not biased by vignetting. We would like to extend a note of caution, however, because the decoding results in the higher order areas (V3AB, IPS0-1, etc) are somewhat poor (especially in comparison to V1, V2, V3) (see Figure 2).

      Reviewer #2:

      Summary:

      According to the sensory recruitment model, the contents of working memory (WM) are maintained by activity in the same sensory cortical regions responsible for processing perceptual inputs. A strong version of the sensory recruitment model predicts that stimulus-specific activity patterns measured in sensory brain areas during WM storage should be identical to those measured during perceptual processing. Previous research casts doubt on this hypothesis, but little is known about how stimulus-specific activity patterns during perception and memory differ. Through clever experimental design and rigorous analyses, Duan & Curtis convincingly demonstrate that stimulus-specific representations of remembered items are highly abstracted versions of representations measured during perceptual processing and that these abstracted representations are immune to aperture biases that contribute to fMRI feature decoding. The paper provides converging evidence that neural states responsible for representing information during perception and WM are fundamentally different, and provides a potential explanation for this difference.

      Strengths:

      (1) The generation of stimuli with matching vs. orthogonal orientations and aperture biases is clever and sets up a straightforward test regarding whether and how aperture biases contribute to orientation decoding during perception and WM. The demonstration that orientation decoding during perception is driven primarily by aperture bias while during WM it is driven primarily by orientation is compelling.

      (2) The paper suggests a reason why orientation decoding during WM might be immune to aperture biases: by weighting multivoxel patterns measured during WM storage by spatial population receptive field estimates from a different task the authors show that remembered but not actively viewed - orientations form "line-like" patterns in retinotopic cortical space.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      (1) The paper tests a strong version of the sensory recruitment model, where neural states representing information during WM are presumed to be identical to neural states representing the same information during perceptual processing. As the paper acknowledges, there is already ample reason to doubt this prediction (see, e.g., earlier work by Kok & de Lange, Curr Biol 2014; Bloem et al., Psych Sci, 2018; Rademaker et al., Nat Neurosci, 2019; among others). Still, the demonstration that orientation decoding during WM is immune to aperture biases known to drive orientation decoding during perception makes for a compelling demonstration.

      We agree with the reviewer, and would add that the main problem with the sensory recruitment model of WM is that it remains underspecified. The work cited above and in our paper, and the results in this report is only the beginning of efforts to fully detail what it means to recruit sensory mechanisms for memory.

      (2) Earlier work by the same group has reported line-like representations of orientations during memory storage but not during perception (e.g., Kwak & Curtis, Neuron, 2022). It's nice to see that result replicated during explicit perceptual and WM tasks in the current study, but I question whether the findings provide fundamental new insights into the neural bases of WM. That would require a model or explanation describing how stimulus-specific activation patterns measured during perception are transformed into the "line-like" patterns seen during WM, which the authors acknowledge is an important goal for future research.

      We agree with the reviewer that perhaps some might see the current results as an incremental step given our previous paper. However, we would point out that researchers have been decoding memorized orientation from the early visual cortex for 15 years, and not one of those highly impactful studies had ever done what we did here, which was to test if decoded WM representations are the product of aperture biases. Not only do our results indicate that decoding memorized orientation is immune to these biases, but they critically suggest a reason why one can decode orientation during WM.

      Reviewer #3:

      Summary:

      In this work, Duan and Curtis addressed an important issue related to the nature of working memory representations. This work is motivated by findings illustrating that orientation decoding performance for perceptual representations can be biased by the stimulus aperture (modulator). Here, the authors examined whether the decoding performance for working memory representations is similarly influenced by these aperture biases. The results provide convincing evidence that working memory representations have a different representational structure, as the decoding performance was not influenced by the type of stimulus aperture.

      Strengths:

      The strength of this work lies in the direct comparison of decoding performance for perceptual representations with working memory representations. The authors take a well-motivated approach and illustrate that perceptual and working memory representations do not share a similar representational structure. The authors test a clear question, with a rigorous approach and provide convincing evidence. First, the presented oriented stimuli are carefully manipulated to create orthogonal biases introduced by the stimulus aperture (radial or angular modulator), regardless of the stimulus carrier orientation. Second, the authors implement advanced methods to decode the orientation information present, in visual and parietal cortical regions, when directly perceiving or holding an oriented stimulus in memory. The data illustrates that working memory decoding is not influenced by the type of aperture, while this is the case in perception. In sum, the main claims are important and shed light on the nature of working memory representations.

      We thank the reviewer for noting the strengths in our work.

      Weaknesses:

      I have a few minor concerns that, although they don't affect the main conclusion of the paper, should still be addressed.

      (1) Theoretical framing in the introduction: Recent work has shown that decoding of orientation during perception does reflect orientation selectivity, and it is not only driven by the stimulus aperture (Roth, Kay & Merriam, 2022).

      Excellent point, and similar to the point made by Reviewer 1. We now adjust our text and cite the paper in the Introduction.

      Below, we paste our response to Reviewer 1:

      “Second, we remain agnostic to the question of whether fine-scale inhomogenous sampling of orientation selective neurons may drive some of the decoding we report here. It is possible that our line-like representations are driven by neurons tuned to the sample orientation that have receptive fields that lie along the line. Here, we instead focus on testing the idea that WM decoding does not depend on aperture biases.”

      (2) Figure 1C illustrates the principle of how the radial and angular modulators bias the contrast energy extracted by the V1 model, which in turn would influence orientation decoding. It would be informative if the carrier orientations used in the experiment were shown in this figure, or at a minimum it would be mentioned in the legend that the experiment used 3 carrier orientations (15{degree sign}, 75{degree sign}, 135{degree sign}) clockwise from vertical. Related, when trying to find more information regarding the carrier orientation, the 'Stimuli' section of the Methods incorrectly mentions that 180 orientations are used as the carrier orientation.

      We apologize for not clearly indicating the stimulus features in the figure. Now, we added the information about the target orientations in Figure 1C legend. Also, we now corrected in the Methods section the mistakes about the carrier orientation and the details of the task. Briefly, participants were asked to use a continuous report over 180 orientations. We now clarify that “We generated 180 orientations for the carrier grating to cover the whole orientation space during the continuous report task.”

      (3) The description of the image computable V1 model in the Methods is incomplete, and at times inaccurate. i) The model implements 6 orientation channels, which is inaccurately referred to as a bandwidth of 60{degree sign} (should be 180/6=30). ii) The steerable pyramid combines information across phase pairs to obtain a measure of contrast energy for a given stimulus. Here, it is only mentioned that the model contains different orientation and spatial scale channels. I assume there were also 2 phase pairs, and they were combined in some manner (squared and summed to create contrast energy). Currently, it is unclear what the model output represents. iii) The spatial scale channel with the maximal response differences between the 2 modulators was chosen as the final model output. What spatial frequency does this channel refer to, and how does this spatial frequency relate to the stimulus?

      (i) First, we thank the reviewer for pointing out this mistake since the range of orientations should be 180deg instead of 360deg. We corrected this in the revised version.

      (ii) Second, we apologize for not being clear. In the second paragraph of the “Simulate model outputs” section, we wrote,

      “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), which had two kinds of phases for both the carriers and the modulators. We first generated the model’s responses to each target image separately, then averaged the model responses across all phases for each orientation condition.”

      We have corrected this text by now writing,

      from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine “For both types of stimuli, we used three target orientations (15°, 75°, and 135° clockwise from vertical), two phases for the carrier (0 or π), and two phases for the modulator (sine or cosine phase). We first generated the model responses to each phase condition separately, then averaged them across all phases for each orientation condition.”

      (iii) Third and again we apologize for the misunderstanding. Since both modulated gratings have the same spatial frequency, the channel with the largest response should be equal to the spatial frequency of the stimulus. We corrected this by now writing,

      “For the final predicted responses, we chose the subband with maximal responses (the 9th level), which corresponds to the spatial frequency of the stimulus (Roth, Heeger, and Merriam 2018).”

      (4) It is not clear from the Methods how the difficulty in the perceptual control task was controlled. How were the levels of task difficulty created?

      Apologies for not being clear. The task difficulty was created by setting the contrast differences between the two stimuli. The easiest level is choosing the first and the last contrast as pairs, while the hardest level is choosing the continuous two contrasts. We added these sentences

      “The contrast for each stimulus was generated from a predefined set of 20 contrasts uniformly distributed between 0.5 and 1.0 (0.025 step size). We created 19 levels of task difficulty based on the contrast distance between the two stimuli. Thus, the difficulty ranged from choosing contrast pairs with the largest difference (0.5, easiest) to contrast pairs with the smallest difference (0.025, hardest). Task difficulty level changed based on an adaptive, 1-up-2-down staircase procedure (Levitt 1971) to maintain performance at approximately 70% correct.”

      Recommendations For The Authors

      (Reviewer #1):

      (1) If the black circle (Fig 3A & C) is the stimulus size, and the stimulus (12º) is roughly half the size of the entire screen (24.8º), then how are spatial reconstructions generated for parts of the visual field that fall outside of the screen? I am asking because in Figure 3 the area over which spatial reconstructions are plotted has a diameter at least 3 times the diameter of that black circle (the stimulus). I'm guessing this is maybe possible when using a very liberal fitting approach to prf's, where the center of a prf can be outside of the screen (so you'd fit a circle to an elongated blob, assuming that blob is the edge of a circle, or something). Can you really reliably estimate that far out into visual space/ extrapolate prf's that exist in a part of the space you did not fully map (because it's outside of the screen)?

      We thank the reviewer for pointing out this confusing issue.

      First, the spatial construction map has a diameter 3 times the diameter of the stimulus because we included voxels whose pRF eccentricities were within 20º in the reconstruction, the same as Kwak & Curtis, 2022. There are reasons for doing so. First, while the height of the screen is 24.8º, the width of the screen is 44º. Thus, it is possible to have voxels whose pRF eccentricities are >20º. Second, for areas outside the height boundaries, there might not be pRF centers, but the whole pRF Gaussian distributions might still cover the area. Moreover, when creating the final map combined across three orientation conditions, we rotated them to be centered vertically, which then required a 20x20º square. Finally, inspecting the reconstruction maps, we noticed that the area that was twice the stimulus size (black circle) made very little contributions to the reconstructions. Therefore, the results depicted in Figure 3A&C are justified, but see the next comment and our response.

      (2) Is the quantification in 3B/C justified? The filter line uses a huge part of visual space outside of the stimulus (and even the screen). For the angular modulator in the "perception" condition, this means that there is no peak at -90/90 degree. But if you were to only use a line that is about the size of the stimulus (a reasonable assumption), it would have a peak at -90/90 degree.

      This is an excellent question. We completely agree that it is more reasonable to use filter lines that have the same size (12º) as the stimulus instead of the whole map size (40º). Based on the feedback from the Reviewer, we redid the spatial reconstruction analyses and now include the following changes to Figure 3.

      (1) We fitted the lines using pixels only within the stimulus. In Figure 3A and Figure 3C, we now replaced the reconstruction maps.

      (2) We added the color bar in Figure 3A.

      (3) We regenerated the filtered responses and calculated the fidelity results by using line filters with the stimulus size. We replaced the filtered responses and fidelity results in Figure 3B and Figure 3D. With the new analysis, as anticipated by the Reviewer, we now found peaks at -90/90 degrees for the angular modulated gratings in the perceptual control task in V1 and V2. Thank you Reviewer 1!!!!

      (4) We also made corresponding changes in the Supplementary Figure S4 and S5, as well as the statistical results in Table S4 and S5.

      (5) In the “Methods” section, we added “within the stimulus size” for both “fMRI data analysis: Spatial reconstruction” and “Quantification and statistical analysis” subsections.

      (3) Figure 4 is nice, but not exactly quantitative. It does not address that the reconstructions from the perceptual task are hugging the stimulus edges much more closely compared to the modeled map. Conversely, the yellow parts of the reconstructions from the delay fan out much further than those of the model. The model also does not seem to dissociate radial/angular stimuli, while in the perceptual data the magnitude of perceptual reconstruction is clearly much weaker for angular compared to radial modulator.

      We thank the reviewer for this question. First, we admit that Figure 4 is more qualitative than quantitative. However, we see no alternative that better depicts the similarity in the model prediction and the fMRI results for the perceptual control and WM tasks. The figure clearly shows the orthogonal aperture bias. Second, we agree that aspects of the observed fMRI results are not perfectly captured by the model. This could be caused by many reasons, including fMRI noise, individual differences, etc. Importantly, different modulators induce orthogonal aperture bias in the perceptual but not the WM task, and therefore does not have a major impact on the conclusions.

      (4) The working memory and perception tasks are rather different. In this case, the perception task does not require the subject to process the carrier orientation (which is largely occluded, and possibly not that obvious without paying attention to it), but attention is paid to contrast. In this scenario, stimulus energy may dominate the signal. In the WM task, subjects have to work out what orientation is shown to do the task. Given that the sensory stimulus in both tasks is brief (1.5s during memory encoding, and 2.5s total in the perceptual task), it would be interesting to look at decoding (and reconstructions) for the WM stimulus epoch. If abstraction (into a line) happens in working memory, then this perceptual part of the task should still be susceptible to aperture biases. It allows the authors to show that it is indeed during memory (and not merely the task or attentional state of the subject) that abstraction occurs.

      We addressed the same point in the response for Reviewer 1, “additional context” section.

      Recommendations for improving the writing:

      (1) The main text had too little information about the Methods. Of course, some things need not be there, but others are crucial to understanding the basics of what is being shown. For example, the main text does not describe how many orientations are used (well... actually the caption to Figure 1 says there are 2: horizontal and vertical, which is confusing), and I had to deduce from the chance level (1/3) that there must have been 3 orientations. Also, given how important the orthogonality of the carrier and modulator are, it would be good to have this explicit (I would even want an analysis showing that indeed the two are independent). A final example is the use of beta weights, and for delay period decoding only the last 6s (of the 12s delay) are modeled and used for decoding.

      We thank the reviewer for identifying aspects of the manuscript that were confusing. We made several changes to the paper to clarify these details.

      First, we added the information about the orientations we used in the caption for Figure 1 and made it clear that Figure 1C is just an illustration using vertical/horizontal orientations. Second, the carrier and the modulator are different in many ways. For example, the carrier is a grating with orientation and contrast information, while the modulator is the aperture that bounds the grating without these features. Their phases are orthogonal, and we added this in the second paragraph of the “Stimuli” section. Last, in the main text and the captions, we now denote “late delay” when writing about our procedures.

      (2) Right under Figure 3, the text reads "angular modulated gratings produced line-like representations that were orthogonal carrier orientation reflecting the influence of stimulus vignetting", but the quantification (Figure 3D) does not support this (there is no orthogonal "bump" in the filtered responses from V1-V3, and one aligned with the carrier orientation in higher areas).

      This point was addressed in the “recommendations for the authors (Reviewer 1), point 2” above.

      Minor corrections to text and figures:

      (1) Abstract: "are WM codes" should probably be "WM codes are".

      We prefer to keep “are WM codes” as it is grammatically correct.

      (2) Introduction: Second sentence 2nd paragraph: representations can be used to decode representations? Or rather voxel patterns can be used...

      Changed to “On the one hand, WM representations can be decoded from the activity patterns as early as primary visual cortex (V1)...”

      (3) Same paragraph: might be good to add more references to support the correlation between V1 decoding and behavior. There's an Ester paper, and Iamchinina et al. 2021. These are not trial-wise, but trial-wise can also be driven by fluctuating arousal effects, so across-subject correlations help fortify this point.

      We added these two papers as references.

      (4) Last paragraph: "are WM codes" should probably be "WM codes are".

      See (1) above.

      (5) Figure 1B & 2A caption: "stimulus presenting epoch" should probably be "stimulus presentation epoch".

      Changed to “stimulus epoch”.

      (6) Figure 1C: So this is very unclear, to say stimuli are created using vertical and horizontal gratings (when none of the stimuli used in the experiment are either).

      We solved and answered this point in response to Reviewer 3, point 2.

      (7) Figure 2B caption "cross" should probably be "across".

      We believe “cross” is fine since cross here means cross-decoding.

      (8) Figure 3A and C are missing a color bar, so it's unclear how these images are generated (are they scaled, or not) and what the BOLD values are in each pixel.

      All values in the map were scaled to be within -1 to 1. We added the color bar in both Figure 3 and Figure 4.

      (9) Figure 3B and D (bottom row) are missing individual subject data.

      We use SEM to indicate the variance across subjects.

      (10) Figure D caption: "early (V1 and V2)" should probably be "early areas (V1 and V2)".

      Corrected.

      (11) Methods, stimuli says "We generated 180 orientations for the carrier grating to cover the whole orientation space." But it looks like only 3 orientations were generated, so this is confusing.

      We solved and answered this point in response to Reviewer 3, point 2.

      (12) Further down (fMRI task) "random jitters" is probably "random jitter"

      Corrected.

    1. Author response:

      Response to Reviewer #1 (Public Review):

      We thank the reviewer for their constructive criticism of our study, their proposed solutions, and for highlighting areas of the methodology and analytical pipeline where explanations were unclear or unsatisfactory. We will take the reviewer’s feedback into account to improve the clarity and readability of the revised manuscript. We acknowledge the importance of ruling out eye movements as a potential confound. We address these concerns briefly below, but a more detailed explanation (and a full breakdown of the relevant analyses, including the corrected and uncorrected results) will be provided in the revised manuscript.

      First, the source of EEG activity recorded from the frontal electrodes is often unclear. Without an external reference, it is challenging to resolve the degree to which frontal EEG activity represents neural or muscular responses1. Thus, as a preventative measure against the potential contribution of eye movement activity, for all our EEG analyses, we only included activity from occipital, temporal, and parietal electrodes (the selected electrodes can be seen in the final inset of Figure 3).

      Second, as suggested by the reviewer, we re-ran our analyses using the activity measured from the frontal electrodes alone. If the source of the nonlinear decoding accuracy in the AV condition was muscular activity produced by eye movements, we would expect to observe better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 4).

      Third, we compared the average eye movements between the three main sensory conditions (auditory, visual, and audiovisual). In the visual condition, there was little difference in eye movements corresponding to the five stimulus locations, likely because the visual stimuli were designed to be spatially diffuse. For the auditory and audiovisual conditions, there was more distinction between eye movements corresponding to the stimulus locations. However, these appeared to be the same between auditory and audiovisual conditions. If consistent saccades to audiovisual stimuli had been responsible for the nonlinear decoding we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Instead, we found no difference in correlation between audiovisual and auditory stimuli, indicating that eye movements were equivalent in these conditions and unlikely to explain better decoding accuracy for audiovisual stimuli.

      Finally, we note that the stricter eye movement criterion acknowledged in the Discussion section of the original manuscript resulted in significantly better audiovisual d' than the MLE prediction, but this difference did not survive cluster correction. This is an important distinction to make as, when combined with the results described above, it seems to support our original interpretation that the stricter criterion combined with our conservative measure of (mass-based) cluster correction2 led to type 2 error.

      References

      (1) Roy, R. N., Charbonnier, S., & Bonnet, S. (2014). Eye blink characterization from frontal EEG electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing and Control, 14, 256–264.

      (2) Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85–93.

      Response to Reviewer #2 (Public Review):

      We thank the reviewer for their insight and constructive feedback. As emphasized in the review, an interesting question that arises from our results is that, if the neural data exceeds the optimal statistical decision (MLE d'), why doesn’t the behavioural data? We agree with the reviewer’s suggestion that more attention should be devoted to this question, and plan to provide a deeper discussion of the relationship between behavioural and neural super-additivity in the revised manuscript. We also note that while this discrepancy remains unexplained, our results are consistent with the literature. That is, both non-linear neural responses (single-cell recordings) and behavioural responses that match MLE are reliable phenomenon in multisensory integration1,2,3,4.

      One possible explanation for this puzzling discrepancy is that behavioural responses occur sometime after the initial neural response to sensory input. There are several subsequent neural processes between perception and a behavioural response5, all of which introduce additional noise that may obscure super-additive perceptual sensitivity. In particular, the mismatch between neural and behavioural accuracy may be the result of additional neural processes that translate sensory activity into a motor response to perform the behavioural task.

      Our measure of neural super-additivity (exceeding optimally weighted linear summation) differs from how it is traditionally assessed (exceeding summation of single neuron responses)2. However, neither method has yet fully explained how this neural activity translates to behavioural responses, and we think that more work is needed to resolve the abovementioned discrepancy. However, our method will facilitate this work by providing a reliable method of measuring neural super-additivity in humans, using non-invasive recordings.

      References

      (1) Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

      (2) Ernst, M. O., & Banks, M. S., (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.

      (3) Meredith, M. A., & Stein, B. E. (1993). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391.

      (4) Stanford, T. R., & Stein, B. E. (2007). Superadditivity in multisensory integration: putting the computation in context. Neuroreport 18, 787–792.

      (5) Heekeren, H., Marrett, S. & Ungerleider, L. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9, 467–479.

    1. Author response:

      Thanks for the eLife assessment

      “This study employed a comprehensive approach to examining how the MT+ region integrates into a complex cognition system in mediating human visuo-spatial intelligence. While the findings are useful, the experimental evidence is incomplete and the study design, hypothesis, analyses, writing, and presentation need to be improved.” We plan to revise the manuscript according to the comments of Public Reviews.

      We are grateful for the excellent and very helpful comments, and now we address provisional author responses.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something??

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient??

      Thank reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes a more extensive dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we are planning to make a correlation matrix to reporting all values.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). In addition, we will check the results of our localizer to confirm whether similar findings are consistently replicated.

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We are planning to make such picture in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavior model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within behavior model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We plan to revise the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we plan to ensure a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we would like to maintain 'Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We are planning to revise the Figure 1a and make it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ works in the 3D viso-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thanks for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thanks for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex(V1). This supports our choice and emphasizes the relevance of MT+ in our study. We will revise our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for reviewer’s suggestion. Now the correlation result is placed in the supplemental material, we will put it back to the main text.

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for reviewer’s suggestion. We plan to draw the V1 ROI MRS scanning area and use the visual template to check if the scanning area contains V2/3. If it does, we will refer to it as the early visual cortex rather than specifically V1 in our reporting.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for reviewer’s suggestion. We plan to do the V1 FC-behavior connection as control analysis. For mediation analysis, since V1 GABA/Glu has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank reviewer for pointing this out. We plan to further interpret the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D viso-spatial intelligence. In addition, we would like to revise Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D viso-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank reviewer for pointing this out. We realized that such expression will lead to confusion. We will delete this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank reviewer for pointing this out. We will attach the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank reviewer for pointing this out. We will revise it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank reviewer for pointing this out. We will revise it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank reviewer for pointing this out. We will revise it.

    1. Author Response:

      We appreciate the thorough comments from the reviewers. Before revising the manuscript, we would like to briefly reply to the main concerns raised:

      • Is pupil size a reliable proxy of effort? A vast amount of work demonstrates that pupil size sensitively scales with fluctuations in effort: for instance, the pupil dilates when increasing load in working memory, or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials.1–4 This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements.5 As reviewed previously6–12 (based on vast literature each), any increase in effort is associated with an increase in pupil size. Inadvertently, we phrased as if the link between effort and pupil size was established via shared neural correlates. However, this is not the case as the link between effort and pupil size had been established well before the underlying neural circuitry of this relationship was investigated in detail. During the revision, we plan to rewrite this section to clarify that pupil size indexes effort and to provide a clear distinction between this link and putative neural underpinnings of such effort-linked modulations.

      • Is saccade latency an alternative explanation for the link between effort and saccade selection? Longer saccade latencies may imply more complex oculomotor programming (e.g. saccades with larger amplitudes require longer latencies for non-microsaccades13, and latencies increase when distractors are presented14), and latencies are indeed known to differ across directions15,16. As suggested, it is possible that saccade latencies may also predict saccade preferences. However, even if this is the case, this would not constitute an alternative explanation. As saccade latency may index oculomotor programming complexity, it can potentially be considered an alternative outcome measure of effort, albeit restricted to the context of saccades. Therefore, if saccade latencies predict saccade preferences, this would not affect our conclusion, rather it would constitute as converging evidence that supports the conclusion that effort drives saccade selection.

      A related question is why one would use pupil size as a measure of effort, given the methodological care that pupillometry requires. There are a number of points that make pupil size sensible and promising in comparison with saccade latencies. In contrast to saccade latencies, pupil size allows to capture the effort of different effector systems (e.g. head or hand movements), and potentially even the effort associated with covert shifts of attention. Moreover, pupil size is a temporally rich and continuous measure that allows to isolate processes unfolding prior to (eye) movement onset (e.g. oculomotor programming). Together, this makes pupil size a powerful tool to study the costs of visual selection more broadly. In the revision, we will add analyses incorporating latencies and other other saccade metrics. We will also discuss the differences between pupil size and saccade latencies in capturing saccade costs and effort.

      • Are the current results causal or correlational? Most of the currently reported results are indeed correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified hypothesis.17 Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most. Whilst mostly correlational, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection. We will address causality in the discussion for transparency and point more clearly to the second visual search experiment for causal evidence.

      References

      (1) Alnæs, D. et al. Pupil size signals mental effort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. J. Vis. 14, 1 (2014).

      (2) Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S. & Naber, M. Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cogn. Sci. e1668 (2023) doi:10.1002/wcs.1668.

      (3) Robison, M. K. & Unsworth, N. Pupillometry tracks fluctuations in working memory performance. Atten. Percept. Psychophys. 81, 407–419 (2019).

      (4) Unsworth, N. & Miller, A. L. Individual Differences in the Intensity and Consistency of Attention. Curr. Dir. Psychol. Sci. 30, 391–400 (2021).

      (5) Richer, F. & Beatty, J. Pupillary Dilations in Movement Preparation and Execution. Psychophysiology 22, 204–207 (1985).

      (6) Bumke, O. Die Pupillenstörungen Bei Geistes-Und Nervenkrankheiten. (Fischer, 1911).

      (7) Kahneman, D. Attention and Effort. (Prentice-Hall, 1973).

      (8) van der Wel, P. & van Steenbergen, H. Pupil dilation as an index of effort in cognitive control tasks: A review. Psychon. Bull. Rev. 25, 2005–2015 (2018).

      (9) Loewenfeld, I. E. Mechanisms of reflex dilatation of the pupil. Doc. Ophthalmol. 12, 185–448 (1958).

      (10) Mathôt, S. Pupillometry: Psychology, Physiology, and Function. J. Cogn. 1, 16 (2018).

      (11) Sirois, S. & Brisson, J. Pupillometry. WIREs Cogn. Sci. 5, 679–692 (2014).

      (12) Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S. & Naber, M. Pupillometry as an integrated readout of distinct attentional networks. Trends Neurosci. 45, 635–647 (2022).

      (13) Kalesnykas, R. P. & Hallett, P. E. Retinal eccentricity and the latency of eye saccades. Vision Res. 34, 517–531 (1994).

      (14) Walker, R., Deubel, H., Schneider, W. X. & Findlay, J. M. Effect of Remote Distractors on Saccade Programming: Evidence for an Extended Fixation Zone. J. Neurophysiol. 78, 1108–1119 (1997).

      (15) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic attention enhances contrast sensitivity, but not at the upper vertical meridian. iScience 25, 103851 (2022).

      (16) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. J. Neurosci. 4

      4, (2024).

      (17) Koevoet, D., Strauch, C., Naber, M. & Van der Stigchel, S. The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychol. Sci. 34, 887–898 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      We thank the reviewers for the detailed assessment of our work as well as their praise and constructive feedback which helped us to significantly improve our manuscript.

      Reviewer #1 (Public Review):

      The inferior colliculus (IC) is the central auditory system's major hub. It integrates ascending brainstem signals to provide acoustic information to the auditory thalamus. The superficial layers of the IC ("shell" IC regions as defined in the current manuscript) also receive a massive descending projection from the auditory cortex. This auditory cortico-collicular pathway has long fascinated the hearing field, as it may provide a route to funnel "high-level" cortical signals and impart behavioral salience upon an otherwise behaviorally agnostic midbrain circuit.

      Accordingly, IC neurons can respond differently to the same sound depending on whether animals engage in a behavioral task (Ryan and Miller 1977; Ryan et al., 1984; Slee & David, 2015; Saderi et al., 2021; De Franceschi & Barkat, 2021). Many studies also report a rich variety of non-auditory responses in the IC, far beyond the simple acoustic responses one expects to find in a "low-level" region (Sakurai, 1990; Metzger et al., 2006; Porter et al., 2007). A tacit assumption is that the behaviorally relevant activity of IC neurons is inherited from the auditory cortico-collicular pathway. However, this assumption has never been tested, owing to two main limitations of past studies:

      (1) Prior studies could not confirm if data were obtained from IC neurons that receive monosynaptic input from the auditory cortex.

      (2) Many studies have tested how auditory cortical inactivation impacts IC neuron activity; the consequence of cortical silencing is sometimes quite modest. However, all prior inactivation studies were conducted in anesthetized or passively listening animals. These conditions may not fully engage the auditory cortico-collicular pathway. Moreover, the extent of cortical inactivation in prior studies was sometimes ambiguous, which complicates interpreting modest or negative results.

      Here, the authors' goal is to directly test if auditory cortex is necessary for behaviorally relevant activity in IC neurons. They conclude that surprisingly, task relevant activity in cortico-recipient IC neuron persists in absence of auditory cortico-collicular transmission. To this end, a major strength of the paper is that the authors combine a sound-detection behavior with clever approaches that unambiguously overcome the limitations of past studies.

      First, the authors inject a transsynaptic virus into the auditory cortex, thereby expressing a genetically encoded calcium indicator in the auditory cortex's postsynaptic targets in the IC. This powerful approach enables 2-photon Ca2+ imaging from IC neurons that unambiguously receive monosynaptic input from auditory cortex. Thus, any effect of cortical silencing should be maximally observable in this neuronal population. Second, they abrogate auditory cortico-collicular transmission using lesions of auditory cortex. This "sledgehammer" approach is arguably the most direct test of whether cortico-recipient IC neurons will continue to encode task-relevant information in absence of descending feedback. Indeed, their method circumvents the known limitations of more modern optogenetic or chemogenetic silencing, e.g. variable efficacy.

      I also see three weaknesses which limit what we can learn from the authors' hard work, at least in the current form. I want to emphasize that these issues do not reflect any fatal flaw of the approach. Rather, I believe that their datasets likely contain the treasure-trove of knowledge required to completely support their claims.

      (1) The conclusion of this paper requires the following assumption to be true: That the difference in neural activity between Hit and Miss trials reflects "information beyond the physical attributes of sound." The data presentation complicates asserting this assumption. Specifically, they average fluorescence transients of all Hit and all Miss trials in their detection task. Yet, Figure 3B shows that mice's d' depends on sound level, and since this is a detection task the smaller d' at low SPLs presumably reflects lower Hit rates (and thus higher Miss rates). As currently written, it is not clear if fluorescence traces for Hits arise from trials where the sound cue was played at a higher sound level than on Miss trials. Thus, the difference in neural activity on Hit and Miss trials could indeed reflect mice's behavior (licking or not licking). But in principle could also be explained by higher sound-evoked spike rates on Hit compared to Miss trials, simply due to louder click sounds. Indeed, the amplitude and decay tau of their indicator GCaMP6f is non-linearly dependent on the number and rate of spikes (Chen et al., 2013), so this isn't an unreasonable concern.

      (2) The authors' central claim effectively rests upon two analyses in Figures 5 and 6. The spectral clustering algorithm of Figure 5 identifies 10 separate activity patterns in IC neurons of control and lesioned mice; most of these clusters show distinct activity on averaged Hit and Miss trials. They conclude that although the proportions of neurons from control and lesioned mice in certain clusters deviates from an expected 50/50 split, neurons from lesioned mice are still represented in all clusters. A significant issue here is that in addition to averaging all Hits and Miss trials together, the data from control and lesioned mice are lumped for the clustering. There is no direct comparison of neural activity between the two groups, so the reader must rely on interpreting a row of pie charts to assess the conclusion. It's unclear how similar task relevant activity is between control and lesioned mice; we don't even have a ballpark estimate of how auditory cortex does or does not contribute to task relevant activity. Although ideally the authors would have approached this by repeatedly imaging the same IC neurons before and after lesioning auditory cortex, this within-subjects design may be unfeasible if lesions interfere with task retention. Nevertheless, they have recordings from hundreds to thousands of neurons across two groups, so even a small effect should be observable in a between-groups comparison.

      (3) In Figure 6, the authors show that logistic regression models predict whether the trial is a Hit or Miss from their fluorescence data. Classification accuracy peaks rapidly following sound presentation, implying substantial information regarding mice's actions. The authors further show that classification accuracy is reduced, but still above chance in mice with auditory cortical lesions. The authors conclude from this analysis task relevant activity persists in absence of auditory cortex. In principle I do not disagree with their conclusion.

      The weakness here is in the details. First, the reduction in classification accuracy of lesioned mice suggests that auditory cortex does nevertheless transmit some task relevant information, however minor it may be. I feel that as written, their narrative does not adequately highlight this finding. Rather one could argue that their results suggest redundant sources of task-relevant activity converging in the IC. Secondly, the authors conclude that decoding accuracy is impaired more in partially compared to fully lesioned mice. They admit that this conclusion is at face value counterintuitive, and provide compelling mechanistic arguments in the Discussion. However, aside from shaded 95% CIs, we have no estimate of variance in decoding accuracy across sessions or subjects for either control or lesioned mice. Thus we don't know if the small sample sizes of partial (n = 3) and full lesion (n = 4) groups adequately sample from the underlying population. Their result of Figure 6B may reflect spurious sampling from tail ends of the distributions, rather than a true non-monotonic effect of lesion size on task relevant activity in IC.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses.

      We would like to highlight one of these because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity, and directly addresses what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions) and the request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #2 (Public Review):

      Summary:

      This study takes a new approach to studying the role of corticofugal projections from auditory cortex to inferior colliculus. The authors performed two-photon imaging of cortico-recipient IC neurons during a click detection task in mice with and without lesions of auditory cortex. In both groups of animals, they observed similar task performance and relatively small differences in the encoding of task-response variables in the IC population. They conclude that non-cortical inputs to the IC provide can substantial task-related modulation, at least when AC is absent. Strengths:

      This study provides valuable new insight into big and challenging questions around top-down modulation of activity in the IC. The approach here is novel and appears to have been executed thoughtfully. Thus, it should be of interest to the community.

      Weaknesses: There are, however, substantial concerns about the interpretation of the findings and limitations to the current analysis. In particular, Analysis of single unit activity is absent, making interpretation of population clusters and decoding less interpretable. These concerns should be addressed to make sure that the results can be interpreted clearly in an active field that already contains a number of confusing and possibly contradictory findings.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Several additional analyses have now been carried out including ones that operate at the level of single units rather than the population level, as requested by the reviewer. We would like to briefly highlight one here because it supplements both the clustering and decoding analysis that we conducted to compare hit and miss trial activity and directly addresses what the other reviewers identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #3 (Public Review):

      Summary:

      This study aims to demonstrate that cortical feedback is not necessary to signal behavioral outcome to shell neurons of the inferior colliculus during a sound detection task. The demonstration is achieved by the observation of the activity of cortico-recipient neurons in animals which have received lesions of the auditory cortex. The experiment shows that neither behavior performance nor neuronal responses are significantly impacted by cortical lesions except for the case of partial lesions which seem to have a disruptive effect on behavioral outcome signaling. Strengths:

      The experimental procedure is based on state of the art methods. There is an in depth discussion of the different effects of auditory cortical lesions on sound detection behavior. Weaknesses:

      The analysis is not documented enough to be correctly evaluated. Have the authors pooled together trials with different sound levels for the key hit vs miss decoding/clustering analysis? If so, the conclusions are not well supported, as there are more misses for low sound levels, which would completely bias the outcome of the analysis. It would possible that the classification of hit versus misses actually only reflects a decoding of sound level based on sensory responses in the colliculus, and it would not be surprising then that in the presence or absence of cortical feedback, some neurons responds more to higher sound levels (hits) and less to lower sound levels (misses). It is important that the authors clarify and in any case perform an analysis in which the classification of hits vs misses is done only for the same sound levels. The description of feedback signals could be more detailed although it is difficult to achieve good temporal resolution with the calcium imaging technique necessary for targeting cortico-recipient neurons.

      Our responses to the ‘recommendations for the authors’ below lay out in detail how we addressed each comment and concern. Besides filling in key information about how our original analysis aimed at minimizing any potential impact of differences in sound level distributions - namely that trials used for decoding were limited to a subset of sound levels - and which was accidentally omitted in the original manuscript, we have now carried out several additional analyses to directly address what the reviewer identified as our work’s main weakness (a possible confound between animal behavior and sound level distributions). This includes an analysis in which we were able to demonstrate for one imaging session with a sufficiently large number of trials that limiting the trials entered into the decoding analysis to those from a single sound level did not meaningfully impact decoding accuracy. We would like to highlight another new analysis here because it supplements both the clustering and decoding analyses that we conducted to compare hit and miss trial activity and addresses the other reviewers’ request for an analysis that operates at the level of single units rather than the population level. Specifically, we assessed, separately for each recorded neuron, whether there was a statistically significant difference in the magnitude of neural activity between hit and miss trials. This approach allowed us to fully balance the numbers of hit and miss trials at each sound level that were entered into the analysis. The results revealed that a large proportion (close to 50%) of units were task modulated, i.e. had significantly different response magnitudes between hit and miss trials, and that this proportion was not significantly different between lesioned and non-lesioned mice. We hope that this, together with the rest of our responses, convincingly demonstrates that the shell of the IC encodes mouse sound detection behavior even when top-down input from the auditory cortex is absent.

      Reviewer #1 (Recommendations For The Authors):

      Thank you for the opportunity to read your paper. I think the conclusion is exciting. Indeed, you indicate that perhaps contrary to many of our (untested) assumptions, task-relevant activity in the IC may persist in absence of auditory cortex.

      As mentioned in my public review: Despite my interest in the work, I also think that there are several opportunities to significantly strengthen your conclusions. I feel this point is important because your work will likely guide the efforts of future students and post-docs working on this topic. The data can serve as a beacon to move the field away from the (somewhat naïve) idea that the evolved forebrain imparts behavioral relevance upon an otherwise uncivilized midbrain. This knowledge will inspire a search for alternative explanations. Indeed, although you don't highlight it in your narrative, your results dovetail nicely with several studies showing task-relevant activity in more ventral midbrain areas that project to the IC (e.g., pedunculopontine nuclei; see work from Hikosaka in monkeys, and more recently in mice from Karel Svoboda's lab).

      Thanks for the kind words.

      These studies, in particular the work by Inagaki et al. (2022) outlining how the transformation of an auditory go signal into movement could be mediated via a circuit involving the PPN/MRN (which might rely on the NLL for auditory input) and the motor thalamus, are indeed highly relevant.

      We made the following changes to the manuscript text.

      Line 472:”...or that the auditory midbrain, thalamus and cortex are bypassed entirely if simple acousticomotor transformations, such as licking a spout in response to a sound, are handled by circuits linking the auditory brainstem and motor thalamus via pedunculopontine and midbrain reticular nuclei (Inagaki et al., 2022).”

      The beauty of the eLife experiment is that you are free to incorporate or ignore these suggestions. After all, it's your paper, not mine. Nevertheless, I hope you find my comments useful.<br /> First, a few suggestions to address my three comments in the public review.

      Suggestion for public comment #1: An easy way to address this issue is to average the neural activity separately for each trial outcome at each sound level. That way you can measure if fluorescence amplitude (or integral) varies as a function of mice's action rather than sound level. This approach to data organization would also open the door to the additional analyses for addressing comment #2, such as directly comparing auditory and putatively non-auditory activity in neurons recorded from control and lesioned mice.

      We have carried out additional analyses for distinguishing between the two alternative explanations of the data put forward by the reviewer: That the difference in neural activity between hit and miss trials reflects a) behavior or b) sound level (more precisely: differences in response magnitude arising from a higher proportion of high-sound-level trials in the hit trial group than in the miss trial group). If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for each sound level. The new Figure 4 - figure supplement 1 indicates that this is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      Changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Additionally, we assessed for each neuron separately whether there was a significant difference between hit and miss trial activity and therefore whether the activity of the neuron could be considered “task-modulated”. To achieve this, we used equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and thus rule out any potential confound between sound level distributions and trial outcome. This analysis revealed that the proportion of task-modulated neurons was very high (close to 50%) and not significantly different between lesioned and non-lesioned mice (Figure 6 - figure supplement 3).

      Changes to the manuscript.

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials…”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our original analysis was actually designed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to the manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in Figure 4 – figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Finally, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level differences between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions), other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Suggestion for public comment #2: Perhaps a solution would be to display example neuron activity in each cluster, recorded in control and lesioned mice. The reader could then visually compare example data from the two groups, and immediately grasp the conclusion that task relevant activity remains in absence of auditory cortex. Additionally, one possibility might be to calculate the difference in neural activity between Hit and Miss trials for each task-modulated neuron. Then, you could compare these values for neurons recorded in control and lesion mice. I feel like this information would greatly add to our understanding of cortico-collicular processing.

      I would also argue that it's perhaps more informative to show one (or a few) example recordings rather than averaging across all cells in a cluster. Example cells would give the reader a better handle on the quality of the imaging, and this approach is more standard in the field. Finally, it would be useful to show the y axis calibration for each example trace (e.g. Figure 5 supp 1). That is also pretty standard so we can immediately grasp the magnitude of the recorded signal.

      We agree that while the information we provided shows that neurons from lesioned and nonlesioned groups are roughly equally represented across the clusters, it does not allow the reader to appreciate how similar the activity profiles of neurons are from each of the two groups. However, picking examples can be highly subjective and thus potentially open to bias. We therefore opted instead to display, separately for lesioned and non-lesioned mice, the peristimulus time histograms of all neurons in each cluster, as well as the cluster averages of the response profiles (Figure 5 - figure supplement 3). This, we believe, convincingly illustrates the close correspondence between neural activity in lesioned and non-lesioned mice across different clusters. All our existing and new figures indicate the response magnitude either on the figures’ y-axis or via scale/color bars.

      Changes to manuscript.

      Line 254: “Furthermore, there was a close correspondence between the cluster averages of lesioned and non-lesioned mice (Figure 5 – figure supplement 3).”

      Furthermore, we’ve now included a video of the imaging data which, we believe, gives the reader a much better handle on the data quality than further example response profiles would.

      Changes to manuscript.

      Line 197: ”...using two-photon microscopy (Figure 4B, Video 1).”

      Suggestion for public comment #3: In absence of laborious and costly follow-up experiments to boost the sample size of partial and complete lesion groups, it may be more prudent to simply tone down the claims that lesion size differentially impacts decoding accuracy. The results of this analysis are not necessary for your main claims.

      Our new results on the proportions of ‘task-modulated’ neurons (Figure 6 - figure supplement 3) across different experimental groups show that there is no difference between non-lesioned and lesioned mice as a whole, but mice with partial lesions have a smaller proportion of taskmodulated neurons than the other two groups. While this corroborates the results of the decoding analysis, we certainly agree that the small sample size is a caveat that needs to be acknowledged.

      Changes to manuscript.

      Line 477: ”Some differences were observed for mice with only partial lesions of the auditory cortex.

      Those mice had a lower proportion of neurons with distinct response magnitudes in hit and miss trials than mice with (near-)complete lesions. Furthermore, trial outcomes could be read out with lower accuracy from these mice. While this finding is somewhat counterintuitive and is based on only three mice with partial lesions, it has been observed before that smaller lesions…”

      A few more suggestions unrelated to public review:

      Figure 1: This is somewhat of an oddball in this manuscript, and its inclusion is not necessary for the main point. Indeed, the major conclusion of Fig 1 is that acute silencing of auditory cortex impairs task performance, and thus optogenetic methods are not suitable to test your hypothesis. However, this conclusion is also easily supported from decades of prior work, and thus citations might suffice.

      We do not agree that these data can easily be substituted with citations of prior published work. While previous studies (Talwar et al., 2001, Li et al., 2017) have demonstrated the impact of acute pharmacological silencing on sound detection in rodents, pharmacological and optogenetic silencing are not equivalent. Furthermore, we are aware of only one published study (Kato et al., 2015) that investigated the impact of optogenetically perturbing auditory cortex on sound detection (others have investigated its impact on discrimination tasks). Kato et al. (2015) examined the effect of acute optogenetic silencing of auditory cortex on the ability of mice to detect the offsets of very long (5-9 seconds) sounds, which is not easily comparable to the click detection task employed by us. Furthermore, when presenting our work at a recent meeting and leaving out the optogenetics results due to time constraints, audience members immediately enquired whether we had tried an optogenetic manipulation instead of lesions. Therefore, we believe that these data represent a valuable piece of information that will be appreciated by many readers and have decided not to remove them from the manuscript.

      A worst case scenario is that Figure 1 will detract from the reader's assessment of experimental rigor. The data of 1C are pooled from multiple sessions in three mice. It is not clear if the signed-rank test compares performance across n = 3 mice or n = 13 sessions. If the latter, a stats nitpicker could argue that the significance might not hold up with a nested analysis considering that some datapoints are not independent of one another. Finally, the experiment does not include a control group, gad2-cre mice injected with a EYFP virus. So as presented, the data are equally compatible with the pessimistic conclusion that shining light into the brain impairs mice's licking. My suggestion is to simply remove Figure 1 from the paper. Starting off with Figure 3 would be stronger, as the rest of the study hinges upon the knowledge that control and lesion mice's behavior is similar.

      Instead of reporting the results session-wise and doing stats on the d’ values, we now report results per mouse and perform stats on the proportions of hits and false alarms separately for each mouse. The results are statistically significant for each mouse and suggest that the differences in d’ are primarily caused by higher false alarm rates during the optogenetic perturbation than in the control condition.

      Changes to manuscript.

      New Figure 1.

      We agree that including control mice not expressing ChR2 would be important for fully characterizing the optogenetic manipulation and that the lack of this control group should be acknowledged. However, in the context of this study, the outcome of performing this additional experiment would be inconsequential. We originally considered using an optogenetic approach to explore the contribution of cortical activity to IC responses, but found that this altered the animals’ sound detection behavior. Whether that change in behavior is due to activation of the opsin or simply due to light being shone on the brain has no bearing on the conclusion that this type of manipulation is unsuitable for determining whether auditory cortex is required for the choice-related activity that we recorded in the IC.

      Changes to manuscript.

      Line 106: ”Although a control group in which the auditory cortex was injected with an EYFP virus lacking ChR2 would be required to confirm that the altered behavior results from an opsindependent perturbation of cortical activity, this result shows that this manipulation is also unsuitable… ”

      Figure 2, comment #1: The micrograph of panel B shows the densest fluorescence in the central IC. You interpret this as evidence of retrograde labeling of central IC neurons that project to the shell IC. This is a nice finding, but perhaps a more relevant micrograph would be to show the actual injection site in the shell layers. The rest of Figure 2 documents the non-auditory cortical sources of forebrain feedback. Since non-auditory cortical neurons may or may not target distinct shell IC sub-circuits, it's important to know where the retrograde virus was injected. Stylistic comment: The flow of the panels is somewhat unorthodox. Panel A and B follow horizontally, then C and D follow vertically, followed by E-H in a separate column. Consider sequencing either horizontally or vertically to maximize the reader's experience.

      Figure 2, comment # 2: It would also be useful to show more rostral sections from these mice, perhaps as a figure supplement, if you have the data. I think there is a lot of value here given a recent paper (Olthof et al., 2019 Jneuro) arguing that the IC receives corticofugal input from areas more rostral to the auditory cortex. So it would be beneficial for the field to know if these other cortical sources do or do not represent likely candidates for behavioral modulation in absence of auditory cortex.

      Figure 2, comment #3: You have a striking cluster of retrogradely labeled PPC neurons, and I'm not sure PPC has been consistently reported as targeting the IC. It would be good to confirm that this is a "true" IC projection as opposed to viral leakage into the SC. Indeed, Figure 2, supplement 2 also shows some visual cortex neurons that are retrogradely labeled. This has bearing on the interpretations, because choice-related activity is rampant in PPC, and thus could be a potential source of the task relevant activity that persists in your recordings. This could be addressed as the point above, by showing the SC sections from these same mice.

      All IC injections were made under visual guidance with the surface of the IC and adjacent brain areas fully exposed after removal of the imaging window. Targeting the IC and steering clear of surrounding structures, including the SC, was therefore relatively straightforward.

      We typically observed strong retrograde labeling in the central nucleus after viral injections into the dorsal IC and, given the moderate injection volume (~50 nL at each of up to three sites), it was also typical to see spatially fairly confined labeling at the injection sites. For the mouse shown in Figure 2, we do not have further images of the IC. This was one of the earliest mice to be included in the study and we did not have access to an automatic slide scanner at the time. We had to acquire confocal images in a ‘manual’ and very time-consuming manner and therefore did not take further IC images for this mouse. We have now included, however, a set of images spanning the whole IC and the adjacent SC sections for the mouse for which we already show sections in Figure 2 - figure supplement 2. These were added as Figure 2 - figure supplement 3A to the manuscript. These images show that the injections were located in the caudal half of the IC and that there was no spillover into the SC - close inspection of those sections did not reveal any labeled cell bodies in the SC. Furthermore, we include as Figure 2 - figure supplement 3B a dozen additional rostral cortical sections of the same mouse illustrating corticocollicular neurons in regions spanning visual, parietal, somatosensory and motor cortex. Given the inclusion of the IC micrographs in the new supplementary figure, we removed panel B from Figure 2. This should also make it easier for the reader to follow the sequencing of the remaining panels.

      Changes to manuscript.

      New Figure 2 - figure supplement 3.

      Line 159: “After the experiments, we injected a retrogradely-transported viral tracer (rAAV2-retrotdTomato) into the right IC to determine whether any corticocollicular neurons remained after the auditory cortex lesions (Figure 2, Figure 2 – figure supplement 2, Figure 2 – figure supplement 3). The presence of retrogradely-labeled corticocollicular neurons in non-temporal cortical areas (Figure 2) was not the result of viral leakage from the dorsal IC injection sites into the superior colliculus (Figure 2 – figure supplement 3).”

      Line 495: “...projections to the IC, such as those originating from somatosensory cortical areas (Lohse et al., 2021; Lesicko et al., 2016) and parietal cortex may have contributed to the response profiles that we observed.

      Figure 5 (see also public review point #2): I am not convinced that this unsupervised method yields particularly meaningful clusters; a grain of salt should be provided to the reader. For example, Clusters 2, 5, 6, and 7 contain neurons that pretty clearly respond with either short latency excitation or inhibition following the click sound on Hits. I would argue that neurons with such diametrically opposite responses should not be "classified" together. You can see the same issue in some of Namboodiri/Stuber's clustering (their Figure 1). It might be useful to make it clear to the reader that these clusters can reflect idiosyncrasies of the algorithm, the behavior task structure, or both.

      We agree.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons, we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Methods:

      How was a "false alarm" defined? Is it any lick happening during the entire catch trial, or only during the time period corresponding to the response window on stimulus trials?

      The response window was identical for catch and stimulus trials and a false alarm was defined as licking during the response window of a catch trial.

      Changes to manuscript.

      Line 598: “During catch trials, neither licking (‘false alarm’) during the 1.5-second response window …”

      L597 and so forth: What's the denominator in the conversion from the raw fluorescence traces into DF/F? Did you take the median or mode fluorescence across a chunk of time? Baseline subtract average fluorescence prior to click onset? Similarly, please provide some more clarification as to how neuropil subtraction was achieved. This information will help us understand how the classifier can decode trial outcome from data prior to sound onset.

      Signal processing did not involve the subtraction of a pre-stimulus period.

      Changes to manuscript.

      Line 629: ”Neuropil extraction was performed using default suite2p parameters (https://suite2p.readthedocs.io/en/latest/settings.html), neuropil correction was done using a coefficient of 0.7, and calcium ΔF/F signals were obtained by using the median over the entire fluorescence trace as F0. To remove slow fluctuations in the signal, a baseline of each neuron’s entire trace was calculated by Gaussian filtering in addition to minimum and maximum filtering using default suite2p parameters. This baseline was then subtracted from the signal.”

      Was the experimenter blinded to the treatment group during the behavior experiments? If not, were there issues that precluded blinding (limited staffing owing to lab capacity restrictions during the pandemic)? This is important to clarify for the sake of rigor and reproducibility.

      Changes to manuscript.

      Line 574: “The experimenters were not blinded to the treatment group, i.e. lesioned or non-lesioned, but they were blind to the lesion size both during the behavior experiments and most of the data processing.”

      Minor:

      L127-128: "In order to test...lesioned the auditory cortex bilaterally in 7 out of 16 animals". I would clarify this by changing the word animals to "mice" and 7 out of 16 by stating n = 9 and n = 7 are control and lesion groups, respectively.

      Agreed.

      Changes to manuscript.

      Line 129: “...compared the performance of mice with bilateral lesions of the auditory cortex (n = 7) with non-lesioned controls (n = 9)”

      L225-226: You rule out self-generated sounds as a likely source of behavioral modulation by citing Nate Sawtell's paper in the DCN. However, Stephen David's lab suggested that in marmosets, post sound activity in central IC may in fact reflect self-generated sounds during licking. I suggest addressing this with a nod to SVD's work (Singla et al., 2017; but see Shaheen et al., 2021).

      Agreed.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Line 238 - 239: You state that proportions only deviate greater than 10% for one of the four statistically significant clusters. Something must be unclear here because I don't understand: The delta between the groups in the significant clusters of Fig 5C is (from left to right) 20%, 20%, 38%, and 12%. Please clarify.

      Our wording was meant to convey that a deviation “from a 50/50 split” of 10% means that each side deviates from 50 by 10% resulting in a 40/60 (or 60/40) split. We agree that that has the potential to confuse readers and is not as clear as it could be and have therefore dropped the ambiguous wording.

      Changes to manuscript.

      Line 253: ”,..the difference between the groups was greater than 20% for only one of them.”

      L445: I looked at the cited Allen experiment; I'd be cautious with the interpretation here. A monosynaptic IC->striatum projection is news to me. I think Allen Institute used an AAV1-EGFP virus for these experiments, no? As you know, AAV1 is quite transsynaptic. The labeled fibers in striatum of that experiment may reflect disynaptic labeling of MGB neurons (which do project to striatum).

      Agreed. We deleted the reference to this Allen experiment.

      L650: Please define "network activity". Is this the fluorescence value for each ROI on each frame of each trial? Averaged fluorescence of each ROI per frame? Total frame fluorescence including neuropil? Depending on who you ask, each of these measures provides some meaningful readout of network activity, so clarification would be useful.

      Changes to manuscript.

      Line 707: “Logistic regression models were trained on the network activity of each session, i.e., the ΔF/F values of all ROIs in each session, to classify hit vs miss trials. This was done on a frame-by-frame basis, meaning that each time point (frame) of each session was trained separately.

      Figure 3 narrative or legend: Listing the F values for the anova would be useful. There is pretty clearly a main effect of training session for hits, but what about for the false alarms? That information is important to solidify the result, and would help more specialized readers interpret the d-prime plot in this figure.

      Agreed. There were significant main effects of training day for both hit rates and false alarm rates (as well as d’).

      Changes to manuscript.

      Line 165: “The ability of the mice to learn and perform the click detection task was evident in increasing hit rates and decreasing false alarm rates across training days (Figure 3A, p < 0.01, mixed-design ANOVAs).”

      In summary, thank you for undertaking this work. Your conclusions are provocative, and thus will likely influence the field's direction for years to come.

      Thank you for those kind words and valuable and constructive feedback, which has certainly improved the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) (Fig. 5) What fraction of individual neurons actually encode task-related information in each animal group? How many neurons respond to sound? The clustering and decoding analyses are interesting, but they obscure these simple questions, which get more directly at the main questions of the study. Suggested approach: For a direct comparison of AC-lesioned and -non-lesioned animals, why not simply compare the mean difference between PSTH response for each neuron individually? To test for trial outcome effects, compare Hit and Miss trials (same stimulus, different behavior) and for sound response effects, compare Hit and False alarm trials (same behavior, different response). How do you align for time in the latter case when there's no stimulus? Align to the first lick event. The authors should include this analysis or explain why their approach of jumping right to analysis of clusters is justified.

      We have now calculated the fraction of neurons that encode trial outcome by comparing hit and miss trial activity. That fraction does not differ between non-lesioned animals and lesioned animals as a whole, but is significantly smaller in mice with partial lesions. The author’s suggestion of comparing hit and false alarm trial activity to assess sound responsiveness is problematic because hit trials involve reward delivery and consumption. Consequently, they are behaviorally very different from false alarm trials (not least because hit trials tend to contain much more licking). Therefore, we calculated the fraction of neurons that respond to the acoustic stimulus by comparing activity before and after stimulus onset in miss trials. We found no significant difference between the non-lesioned and lesioned mice or between subgroups.

      We have addressed these points with the following changes to the manuscript:

      Line 217: “Indeed, close to half (1272 / 2649) of all neurons showed a statistically significant difference in response magnitude between hit and miss trials, while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials, we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. Sounddriven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation.”

      Some more specific concerns about focusing only on cluster-level and population decoding analysis are included below.

      (2) (L 234) "larger field of view". Do task-related or lesion-dependent effects depend on the subregion of IC imaged? Some anatomists would argue that the IC shell is not a uniform structure, and concomitantly, task-related effects may differ between fields. Did coverage of IC subregions differ between experimental groups? Is there any difference in task related effects between subregions of IC? Or maybe all this work was carried out only in the dorsal area? The differences between lesioned and non-lesioned animals are relatively small, so this may not have a huge impact, but a more nuanced discussion that accounts for observed or potential (if not tested) differences between regions of the IC.

      The specific subregion coverage could also impact the decoding analysis (Fig 6), and if possible it might be worth considering an interaction between field of view and lesion size on decoding.

      Each day we chose a new imaging location to avoid recording the same neurons more than once and aimed to sample widely across the optically accessible surface of the IC. We typically stopped the experiment only when there were no more new areas to record from. In terms of the depth of the imaged neurons, we were limited by the fact that corticorecipient neurons become sparser with depth and that the signal available from the GCaMP6f labeling of the Ai95 mice becomes rapidly weaker with increasing distance from the surface. This meant that we recorded no deeper than 150 µm from the surface of the IC. Consequently, while there may have been some variability in the average rostrocaudal and mediolateral positioning of imaging locations from animal to animal due to differences between mice in how much of the IC surface was visible, cranial window positioning, and in neuronal labeling etc, our dataset is anatomically uniform in that all recorded neurons receive input from the auditory cortex and are located within 150 µm of the surface of the IC. Therefore, we think it highly unlikely that small sampling differences across animals could have a meaningful impact on the results.

      Given that there is no consensus as to where the border between the dorsal and external/lateral cortices of the IC is located and that it is typically difficult to find reliable anatomical reference points (the location of the borders between the IC and surrounding structures is not always obvious during imaging, i.e. a transition from a labeled area to a dark area near the edge of the cranial window could indicate a border with another structure, but also the IC surface sloping away from the window or simply an unlabeled area within the IC), we made no attempt to assign our recordings from corticorecipient neurons to specific subdivisions of the IC.

      Changes to manuscript.

      Line 195: “We then proceeded to record the activity of corticorecipient neurons within about 150 µm of the dorsal surface of the IC using two-photon microscopy (Figure 4B, Video 1).”

      Line 375: “We imaged across the optically accessible dorsal surface of the IC down to a depth of about 150 µm below the surface. Consequently, the neurons we recorded were located predominantly in the dorsal cortex. However, identifying the borders between different subdivisions of the IC is not straightforward and we cannot rule out the possibility that some were located in the lateral cortex.”

      (3) (L 482-483) "auditory cortex is not required for the task-related activity recording in IC neurons of mice performing a sound detection task". Most places in the text are clearer, but this statement is confusing. Yes, animals with lesions can have a "normal"-looking IC, but does that mean that AC does not strongly modulate IC during this behavior in normal animals? The authors have shown convincingly that subcortical areas can both shape behavior and modulate IC normally, but AC may still be required for IC modulation in non-lesioned animals. Given the complexity of this system, the authors should make sure they summarize their results consistently and clearly throughout the manuscript.

      The reviewer raises an important point. What we have shown is that corticorecipient dorsal IC neurons in mice without auditory cortex show neural activity during a sound detection task that is largely indistinguishable from the activity of mice with an intact auditory cortex. In lesioned mice, the auditory cortex is thus not required. Whether the IC activity of the non-lesioned group can be shaped by input from the auditory cortex in a meaningful way in other contexts, such as during learning, is a question that our data cannot answer.

      Changes to manuscript.

      Line 508: "While modulation of IC activity by this descending projection has been implicated in various functions, most notably in the plasticity of auditory processing, we have shown in mice performing a sound detection task that IC neurons show task-related activity in the absence of auditory cortical input."

      LESSER CONCERNS

      (L. 106-107) "Optogenetic suppression of cortical activity is thus also unsuitable..." It appears that behavior is not completely abolished by the suppression. One could also imagine using a lower dose of muscimol for partial inactivation of AC feedback. When some behavior persists, it does seem possible to measure task-related changes in the IC. This may not be necessary for the current study, but the authors should consider how these transient methods could be applied usefully in the Discussion. What about inactivation of cortical terminals in the IC? Is that feasible?

      Our argument is not that acute manipulations are unsuitable because they completely abolish the behavior, but because they significantly alter the behavior. Although it would not be trivial to precisely measure the extent of pharmacological cortical silencing in behaving mice that have been fitted with a midbrain window, it should be possible to titrate the size of a muscimol injection to achieve partial silencing of the auditory cortex that does not fully abolish the ability to detect sounds. However, such an outcome would likely render the data uninterpretable. If no effect on IC activity was observed, it would not be possible to conclude whether this was due to the fact that the auditory cortex was only partially silenced or that projections from the auditory cortex have no influence on the recorded IC activity. Similarly, if IC activity was altered, it would not be possible to say whether this was due to altered descending modulation resulting from the (partially) silenced auditory cortex or to the change in behavior, which would likely be reflected in the choice-related activity measured in the IC.

      Silencing of corticocollicular axons in the IC is potentially a more promising approach and we did devote a considerable amount of time and effort to establishing a method that would allow us to simultaneously image IC neurons while silencing corticocollicular axons, trying both eNpHR3.0 and Jaws with different viral labeling approaches and mouse lines. However, we ultimately abandoned those attempts because we were not convinced that we had achieved sufficient silencing or that we would be able to convincingly verify this. Furthermore, axonal silencing comes with its own pitfalls and the interpretation of its consequences is not straightforward. Given that our discussion already contains a section (line 421) on axonal silencing, we do not feel there would be any benefit in adding to that.

      (Figure 1). Can the authors break down the performance for FA and HR, as they do in Fig. 3? It would be helpful to know what aspect of behavior is impaired by the transient inactivation.

      Good point. Figure 1 has been updated to show the results separately for hit rates, false alarms and d’. The new figure indicates that the change in d’ is primarily a consequence of altered false alarm rates. Please also see our response to a related comment by reviewer #1.

      Changes to manuscript.

      New figure 1.

      (Figure 4 legend). Minor: Please clarify, what is time 0 in panel C? Time of click presentation?

      Yes, that is correct.

      Changes to manuscript.

      Line 209: ”Vertical line at time 0 s indicates time of click presentation.”

      (L. 228-229). There has been a report of lick and other motor related activity in the IC - e.g., see Shaheen, Slee et al. (J Neurosci 2021), the timing of which suggests that some of it may be acoustically driven.

      Thanks for pointing this out. Shaheen et al., 2021 should certainly have been cited by us in this context as well as in other parts of the manuscript.

      Changes to manuscript.

      Line 243: “(Singla et al., 2017; but see Shaheen et al., 2021)”

      Also, have the authors considered measuring a peri-lick response? The difference between hit and miss trials could be perceptual or it could reflect differences in motor activity. This may be hard to tease apart, but, for example, one can test whether activity is stronger on trials with many licks vs. few licks?

      (L. 261) "Behavior can be decoded..." similar or alternative to the previous question of evoked activity, can you decode lick events from the population activity?

      The difference between hit and miss trial activity almost certainly partially reflects motor activity associated with licking. This was stated in the Discussion, but to make that point more explicitly, we now include a plot of average false alarm trial activity, i.e. trials without sound (catch trials) in which animals licked (but did not receive a reward).

      Given a sufficient number of catch trials, it should be possible to decode false alarm and correct rejection trials. However, our experiment was not designed with that in mind and contains a much smaller number of catch trials than stimulus trials (approximately one tenth the number of stimulus trials), so we have not attempted this.

      Changes to manuscript.

      New Figure 4 - figure supplement 1.

      (L. 315) "Pre-stimulus activity..." Given reports of changes in activity related to pupil-indexed arousal in the auditory system, do the authors by any chance have information about pupil size in these datasets?

      Given that all recordings were performed in the dark, fluctuations in pupil diameter were relatively small. Therefore, we have not made any attempt to relate pupil diameter to any of the variables assessed in this manuscript.

      (L. 412) "abolishes sound detection". While not exactly the same task, the authors might comment on Gimenez et al (J Neurophys 2015) which argued that temporary or permanent lesioning of AC did not impair tone discrimination. More generally, there seems to be some disagreement about what effects AC lesions have on auditory behavior.

      Thank you for this suggestion. Gimenez et al. (2015) investigated the ability of freely moving rats to discriminate sounds (and, in addition, how they adapt to changes in the discrimination boundary). Broadly consistent with later reports by Ceballo et al. (2019) (mild impairment) and O’Sullivan et al. (2019) (no impairment), Gimenez et al. (2015) reported that discrimination performance is mildly impaired after lesioning auditory cortex. Where the results of Gimenez et al. (2015) stand out is in the comparatively mild impairments that were seen in their task when they used muscimol injections, which contrast with the (much) larger impairments reported by others (e.g. Talwar et al., 2001; Li et al., 2017; Jaramillo and Zador, 2014).

      Changes to manuscript.

      Line 433: ”However, transient pharmacological silencing of the auditory cortex in freely moving rats (Talwar et al., 2001), as well as head-fixed mice (Li et al., 2017), completely abolishes sound detection (but see Gimenez et al., 2015).”

      (L. 649) "... were generally separable" Is the claim here that the clusters are really distinct from each other? This is unexpected, and it might be helpful if the authors could show this result in a figure.

      The half-sentence that this comment refers to has been removed from the methods section. Please also see a related comment by reviewer #1 which prompted us to add the following to the methods section.

      Changes to manuscript.

      Line 666: “While clustering is a useful approach for organizing and visualizing the activity of large and heterogeneous populations of neurons we need to be mindful that, given continuous distributions of response properties, the locations of cluster boundaries can be somewhat arbitrary and/or reflect idiosyncrasies of the chosen method and thus vary from one algorithm to another. We employed an approach very similar to that described in Namboodiri et al. (2019) because it is thought to produce stable results in high-dimensional neural data (Hirokawa et al. 2019).”

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors must absolutely clarify if the hit versus misses decoding and clustering analysis is done for a single sound level or for multiple sound levels (what is the fraction of trials for each sound leve?). If the authors did it for multiple sound levels they should redo all analyses sound-level by sound-level, or for a single sound level if there is one that dominates. No doubt that there is information about the trial outcome in IC, but it should not be over-estimated by a confound with stimulus information.

      This is an important point. The original clustering analysis was carried out across different sound levels. We have now carried out additional analysis for distinguishing between two alternative explanations of the data, which were also raised by reviewer #1. – that the difference in neural activity between hit and miss trials could reflect a) the animals’ behavior or b) relatively more hit trials at higher sound levels, which would be expected to produce stronger responses. If the data favored b), we would expect no difference in activity between hit and miss trials when plotted separately for different sound levels. The new figure 4 - figure supplement 1 indicates that that is not the case. Hit and miss trial activity are clearly distinct even when plotted separately for different sound levels, confirming that this difference in activity reflects the animals’ behavior rather than sensory information.

      We made the following changes to manuscript.

      Line 214: “While averaging across all neurons cannot capture the diversity of responses, the averaged response profiles suggest that it is mostly trial outcome rather than the acoustic stimulus and neuronal sensitivity to sound level that shapes those responses (Figure 4 – figure supplement 1).”

      Differences in the distributions of sound levels in the different trial types could also potentially confound the decoding into hit and miss trials. Our analysis actually aimed to take this into account but, unfortunately, we failed to include sufficient details in the methods section.

      Changes to manuscript.

      Line 710: “Rather than including all the trials in a given session, only trials of intermediate difficulty were used for the decoding analysis. More specifically, we only included trials across five sound levels, comprising the lowest sound level that exceeded a d’ of 1.5 plus the two sound levels below and above that level. That ensured that differences in sound level distributions would be small, while still giving us a sufficient number of trials to perform the decoding analysis.“

      In this context, it is worth bearing in mind that a) the decoding analysis was done on a frame-byframe basis, meaning that the decoding score achieved early in the trial has no impact on the decoding score at later time points in the trial, b) sound-driven activity predominantly occurs immediately after stimulus onset and is largely over about 1 s into the trial (see cluster 3, for instance, or average miss trial activity in figure 4 - figure supplement 1), c) decoding performance of the behavioral outcome starts to plateau 500-1000 ms into the trial and remains high until it very gradually begins to decline after about 2 s into the trial. In other words, decoding performance remains high far longer than the stimulus would be expected to have an impact on the neurons’ activity. Therefore, we would expect any residual bias due to differences in the sound level distribution that our approach did not control for to be restricted to the very beginning of the trial and not to meaningfully impact the conclusions derived from the decoding analysis.

      Furthermore, we carried out an additional decoding analysis for one imaging session in which we had a sufficient number of trials to perform the analysis not only over the five (59, 62, 65, 68, 71 dB SPL) original sound levels, but also over a reduced range of three (62, 65, 68 dB SPL) sound levels, as well as a single (65 dB SPL) sound level (Figure 6 - figure supplement 1). The mean sound level difference between the hit trial distributions and miss trial distributions for these three conditions were 3.08, 1.01 and 0 dB, respectively. This analysis suggests that decoding performance is not meaningfully impacted by changing the range of sound levels (and sound level distributions) other than that including fewer sound levels means fewer trials and thus noisier decoding.

      Changes to manuscript.

      Line 287: ”...and was not meaningfully affected by differences in sound level distributions between hit and miss trials (Figure 6 – figure supplement 1).”

      Finally, in order to supplement the decoding analysis, we determined for each individual neuron whether there was a significant difference between the average hit and average miss trial activity. Note that this was done using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions and to rule out any potential confound of sound level. This revealed that the proportion of neurons containing “information about trial outcome” was generally very high, close to 50% on average, and not significantly different between lesioned and non-lesioned mice.

      Changes to manuscript.

      Line 307: “Although the proportion of individual neurons with distinct response magnitudes in hit and miss trials in lesioned mice did not differ from that in non-lesioned mice, it was significantly lower when separating out mice with partial lesions (Figure 6 – figure supplement 3).”

      Line 648: “Analysis of task-modulated and sound-driven neurons. To identify individual neurons that produced significantly different response magnitudes in hit and miss trials, we calculated the mean activity for each stimulus trial by taking the mean activity over the 5 seconds following stimulus presentation and subtracting the mean activity over the 2 seconds preceding the stimulus during that same trial. A Mann-Whitney U test was then performed to assess whether a neuron showed a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) in response magnitude between hit and miss trials. The analysis was performed using equal numbers of hit and miss trials at each sound level to ensure balanced sound level distributions. If, for a given sound level, there were more hit than miss trials we randomly selected a sample of hit trials (without substitution) to match the sample size for the miss trials and vice versa. ”

      (2) I have the feeling that the authors do not exploit fully the functional data recorded with two-imaging. They identify several cluster but do not describe their functional differences. For example, cluster 3 is obviously mainly sensory driven as it is not modulated by outcome. This could be mentioned. This could also be used to rule out that trial outcome is the results of insufficient sensory inputs. Could this cluster be used to predict trial outcome at the onset response? Could it be used to predict the presence of the sound, and with which accuracy. The authors discuss a bit the different cluster type, but in a very elusive manner. I recognize that one should be careful with the use of signal analysis methods in calcium imaging but a simple linear deconvolution of the calcium dynamic who help to illustrate the conclusions that the authors propose based on peak responses. It would also be very interesting to align the clusters responses (deconvolved) to the timing of licking and rewards event to check if some clusters do not fire when mice perform licks before the sound comes. It would help clarify if the behavioral signals described here require both the presence of the sound and the behavioral action or are just the reflection of the motor command. As noted by the authors, some clusters have late peak responses (2 and 5). However, 2 and 5 are not equivalent and a deconvolution would evidence that much better. 2 has late onset firing. 5 has early onset but prolonged firing.

      We agree with the reviewer’s statement that “cluster 3 is obviously mainly sensory driven”. In the Discussion we refer to cluster 3 as having a “largely behaviorally invariant response profile to the auditory stimulus” (line X), which is consistent with the statement of the reviewer. With regard to the reviewer’s suggestion to describe the “functional differences” between the clusters, we would like to refer to the subsequent three sentences of the same paragraph in which we speculate on the cognitive and behavioral variables that may underlie the response profiles of different clusters. Given the limitations imposed by the task structure, we do not think it is justified to expand on this.

      We have added an additional analysis in order to explicitly address the question of which neurons are sound responsive (please also see response to point 3 below and to point 1 of reviewer #2). That trial outcome could be predicted on the basis of only the sound-responsive neurons’ activity during the initial period of the trial (“predict trial outcome at the onset response”) is unlikely given their small number (only 97 of 2649 neurons show a statistically significant sound-evoked response) and given that only a minority (42/98) of those sound-driven neurons are also modulated by trial outcome within that initial trial period (i.e. 0-1s after stimulus onset; data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      While calcium traces represent an indirect measure of neural activity, deconvolution does not necessarily provide an accurate picture of the spiking underlying those traces and has the potential to introduce additional problems. For instance, deconvolution algorithms tend to perform poorly at inferring the spiking of inhibited neurons (Vanwalleghem et al., 2021). Given that suppression is such a prominent feature of IC activity and is evident both in our calcium data as well as in the electrophysiology data of others (Franceschi and Barkat, 2021), we decided against using deconvolved spikes in our analyses. See also the side-by-side comparison below of the hit and miss trial activity of one example neuron based on either the calcium trace (left) or deconvolved spikes (right) (extracted using the OASIS algorithm (Friedrich et al., 2017) incorporated into suite2p (Pachitariu et al., 2016).

      Author response image 1.

      (3) Along the same line, the very small proportion of really sensory driven neurons (cluster 3) is not discussed. Is it what on would expect in typical shell or core IC neurons?

      As requested by reviewer #2 and mentioned in response to the previous point, we have now quantified the number of neurons in the dataset that produced significant responses to sound (97 / 2649). For a given imaging area, the fraction of neurons that show a statistically significant change in neural activity following presentation of a click of between 53 dB SPL and 65 dB SPL rarely exceeded ten percent. While that number is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).

      Changes to manuscript.

      Line 219: “..., while only a small fraction (97 / 2649) exhibited a significant response to the sound.”

      Line 658: “Sound-driven neurons were identified by comparing the mean miss trial activity before and after stimulus presentation. Specifically, we performed a Mann-Whitney U test to assess whether there was a statistically significant difference (Benjamini-Hochberg adjusted p-value of 0.05) between the mean activity over the 2 seconds preceding the stimulus and the mean activity over the 1 second period following stimulus presentation. This analysis was performed using miss trials with click intensities from 53 dB SPL to 65 dB SPL (many sessions contained very few or no miss trials at higher sound levels).”

      Line 220: “While the number of sound-responsive neurons is low, it is not necessarily surprising given the moderate intensity and very short duration of the stimuli. For comparison: Using the same transgenics, labeling approach and imaging setup and presenting 200-ms long pure tones at 60 dB SPL with frequencies between 2 kHz and 64 kHz, we typically find that between a quarter and a third of neurons in a given imaging area exhibit a statistically significant response (data not shown).”

      (4) In the discussion, the interpretation of different transient and permanent cortical inactivation experiment is very interesting and well balanced given the complexity of the issue. There is nevertheless a comment that is difficult to follow. The authors state:

      If cortical lesioning results in a greater weight being placed on the activity in spared subcortical circuits for perceptual judgements, we would expect the accuracy with which trial-by-trial outcomes could be read out from IC neurons to be greater in mice without auditory cortex. However, that was not the case.

      However, there is no indication that the activity they observe in shell IC is causal to the behavioral decision and likely it is not. There is also no indication that the behavioral signals seen by the authors reflect the weight put on the subcortical pathway for behavior. I find this argument handwavy and would remove it.

      While we are happy to amend this section, we would not wish to remove it because a) we believe that the point we are trying to make here is an important and reasonable one and b) because it is consistent with the reviewer’s comment. Hopefully, the following will make this clearer: In order for the mouse to make a perceptual judgment and act upon it - in the context of our task, hearing a sound and then licking a spout - auditory information needs to be read out and converted into a motor command. If the auditory cortex normally plays a key role in such perceptual judgments, cortical lesions would require the animal to base its decisions on the information available from the remaining auditory structures, potentially including the auditory midbrain. This might result in a greater correspondence between the mouse’s behavior and the neural activity in those structures. That we did not observe this outcome for the IC could mean that the auditory cortex did not contribute to the relevant perceptual judgments (sound detection) in the first place. Therefore, no reweighting of signals from the other structures is necessary. Alternatively, greater weight might be placed exclusively on structures other than the auditory midbrain, e.g. the thalamus. The latter would imply that the contribution of the IC remains the same. This includes the possibility that the IC shell does not play a causal role in the behavioral decision – in either control mice or mice with cortical lesions – as suggested by the reviewer.

      Changes to manuscript.

      Line 471: “This could imply that, following cortical lesions, greater weight is placed on structures other than the IC, with the thalamus being the most likely candidate, ..”

      (5) In Fig. 5 the two colors used in B and C are the same although they describe different categories.

      The dark green and ‘deep orange’ we used to distinguish between non-lesioned and lesioned in Figure 5C are slightly lighter than the colors used to distinguish between these two categories in other figures and therefore might be more easily confused with the blue and red in Figure 5B. This has been changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have made revisions accordingly. The following is a list of the changes we have made in this revised Version of Record:

      (1) We have added three more panels to Figure 1-figure supplement 1, showing that lipopolysaccharide-induced severe lung injury also generate some ectopic tuft cells expressing both Dclk1 and Gα-gustducin, a G protein α subunit expressed in taste bud cells and many tuft cells.

      (2) We have added a new supplemental figure, Figure 2-figure supplement 1, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163) indicating the numbers of Trpm5-GFP+ ectopic tuft cells expressing Tas2r108, Tas2r105, Tas2r138, Tas2r137 and other Tas2rs, respectively. And the original “Figure 2-figure supplement 1” in the previous version has been changed to “Figure 2-figure supplement 2”.

      (3) We have added another new supplemental figure, Figure 3-figure supplement 1, showing the H1N1 infection-damaged lung tissue volumes in the Gng13-cKO mice are significantly greater than those in WT or Trpm-/- mice, which is in agreement with the data of the injured lung surface areas from these three genotypes of mice (Figure 3 C and D). And the original “Figure 3-figure supplement 1” in the previous version has been changed to “Figure 3-figure supplement 2”.

      (4) We have added to the new Figure 3-figure supplement 2 two new panels: I and J, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163), indicating that about 57% of Trpm5-GFP+ ectopic tuft cells express Gγ13, some of which express Alox5, a key enzyme to the biosynthesis of pro-resolving mediators.

      (5) We have added one reference on Sytox and another on Alox5.

      (6) We have corrected two labeling errors to Figure 3 G and M, and some other typos in the article. Also, we have removed “Present address” attached to some authors since no present address was needed at all.

      Attached below is our point-by-point reply to the comments and suggestions made by the reviewers. We hope that you and the reviewers will find all concerns satisfactorily addressed.

      Responses to public reviews:

      Reviewer #1:

      Li et al. report here on the expression of a G-protein subunit Gng13 in ectopic tuft cells that develop after severe pulmonary injury in mice. By deleting this gene in ectopic tuft cells as they arise, the authors observed worsened lung injury and greater inflammation after influenza infection, as well as a decrease in the overall number of ectopic tuft cells. This was in stark contrast to the deletion of Trpm5, a cation channel generally thought to be required for all functional gustatory signaling in tuft cells, where no phenotype is observed. Strengths here include a thorough assessment of lung injury via a number of different techniques. Weaknesses are notable: confusingly, these findings are at odds with reports from other groups demonstrating no obvious phenotype upon influenza infection in mice lacking the transcription factor Pou2f3, which is essential for all tuft cell specification and development. The authors speculate that heterogeneity within nascent tuft cell populations, specifically the presence of pro- and anti-inflammatory tuft cells, may explain this difference, but they do not provide any data to support this idea.

      We thank the reviewer for pointing out the strengths of this work. The phenotypes of the Gng13 conditional knockout mice upon severe pulmonary injury seem to be severer than those of Trpm5 knockout or Pou2f3 knockout mice, which we would attribute to functionally specific tuft cell subtypes. In the intestines, tuft cells are known to promote type II innate immune responses. Those ectopic pulmonary tuft cells emerge at 12 days post infection, and may not be involved in the initial immune responses to the infection, and instead, some of them may contribute to the inflammation resolution and functional recovery. Reanalysis of the previously published single tuft cell RNAseq dataset indeed showed that Gng13 is expressed in a subset of these ectopic pulmonary tuft cells, and anti-inflammatory genes such as Alox5 are also found in some of these tuft cells (please see the newly added Figure 3 supplement 2 I and J). Together, these data suggest that while some of these tuft cells may still play a pro-inflammatory role as in the intestines, some other Gγ13-expressing tuft cells contribute to the inflammation resolution, and disruption of the latter’s function results in the severer phenotypes.

      Reviewer #2:

      The study by Li et al. aimed to demonstrate the role of the Gγ13-mediated signal transduction pathway in tuft cell-driven inflammation resolution and repairing injured lung tissue. The authors showed a reduced number of tuft cells in the parenchyma of Gγ13 null lungs following viral infection. Mice with a Gγ13 null mutation showed increased lung damage and heightened macrophage infiltration when exposed to the H1N1 virus. Their further findings suggested that lung inflammation resolution, epithelial barrier, and fibrosis were worsened in Gγ13 null mutants.

      Strengths:

      The beautiful immunostaining findings do suggest that the number of tuft cells is decreased in Gr13 null mutants.

      Weaknesses:

      The description of phenotypes, and the approaches used to measure the phenotypes are problematic. Rigorous investigation of the mouse lung phenotypes is needed to draw meaningful conclusions.

      Thank the reviewer for pointing out the major findings and strengths of our work. Regarding the approaches used to measure the phenotypes, we first did double immunostaining and validated that the lipopolysaccharide-induced DCLK1+ positive cells are indeed ectopic pulmonary tuft cells with an antibody to Gα-gustducin, a commonly expressed G protein α subunit in taste buds and tuft cells. Second, in addition to the measurements of the injured lung surface areas, we determined the injured lung tissue volumes by slicing the injured lungs into a series of tissue sections, quantifying the injured areas in each section and then reconstructing the injured volumes. Third, we reanalyzed the previously published single-tuft cell RNAseq dataset and found that a subset (i.e., ~57%) of Trpm5-GFP+ tuft cells express Gng13, some of which express anti-inflammatory genes such as Alox5. These additional data further support our finding that a subset of these Gγ13-expressing ectopic tuft cells may contribute to the inflammation resolution while others may play a proinflammatory role.

      Reply to the recommendations of Reviewer #1:

      (1) A major issue with this study is the fact that Chat-Cre mediated knockout of Gng13 leads to reduced tuft cells and impaired recovery, yet global TRPM5 deletion (this study) and global Pou2f3 deletion (Barr et al.) exhibit no apparent phenotype. One can imagine a Trpm5-independent role of Gng13 in tuft cells, but it is much harder to reconcile with the fact that Pou2f3 KO mice, which lack tuft cells entirely, exhibit no apparent phenotype. This was examined in some detail in Barr et al., demonstrating no apparent change in weight loss, dysplastic expansion (Krt5+ cells), or goblet cell metaplasia. The most parsimonious explanation is that Gng13 deletion in another Chat+ cell type, probably neurons of some sort, is leading to this phenotype. The authors really need to investigate this in some detail as the data does not really support a role of tuft cells in the phenotype they observe. Better yet, identification of the other Chat+ cell type in which Gng13 deletion promotes impaired lung recovery would be very interesting. While neurons seem likely, perhaps there is another Chat+ cell type expressing Gng13 in the respiratory tract that could be playing a role as well. In either case, the discrepancy between Pou2f3 KO (no phenotype) and Chat-Cre / Gng13 KO (impaired recovery) is difficult to reconcile.

      We agree with the reviewer, and it took us some time to make senses of the data as well. The differences in phenotypes between Trpm5-knockout versus Gng13 conditional knockout (Gng13-cKO) could be explained by that Gγ13 is a partner of Gβγ moiety of a heterotrimeric G protein (Gαβγ),which is known to act on many effector enzymes and ion channels, while Trpm5 largely regulates the influx of monovalent cations, depolarizing the plasma membrane potentials. Thus, it is understandable that nullification of Gng13 may have more profound effect on cell physiology and consequent phenotypes than that of Trpm5, and similar differential effects were also found in the intestines (Frontiers in Immunology, 2023, DOI 10.3389/fimmu.2023.1259521).

      Data from several research groups have indicated that there are subtypes of tuft cells, each of which displays unique gene expression patterns as well as input and out signal profiles. It is yet not well understood how each subtype may contribute to the inflammatory responses or inflammation resolution. Comparative analyses of our data from the Gng13-cKO mice versus those from Pou2f3-KO mice suggest that Gng13-expressing tuft cells may have a role in the inflammation resolution while other ectopic tuft cells may contribute to the maintenance of the inflammation at a certain level, impairing subsequent tissue repairing and recovery. The exact molecular and cellular mechanisms are to be revealed in our future studies.

      The central nervous system may also play a role in the impaired lung recovery. But our detailed immunochemical studies did not identify any significant number of neurons innervating the lung tissue co-expressing ChAT and Gng13, suggesting that no immediate action from these neurons may regulate the pulmonary inflammation resolution or functional recovery.

      Together, our data suggest the importance of tuft cell subtype-specific functions, which may help us further understand the role of these rare tuft cells.

      (2) Figures showing alternative injury models inducing the generation of ectopic tuft cells are not convincing and not quantified. DCLK1 can be a bit promiscuous, so verifying tuft cell expansion in these other models with other markers (especially for LPS and HDM which have not been reported elsewhere) is important.

      We agree with the reviewer that DCLK1 is not a very specific marker for tuft cells. We have also observed that chemical inductions of these ectopic tuft cells with bleomycin, HDM or LPS are not as effective as H1N1 viruses. To verify that these rare DCLK1-positive cells are indeed tuft cells, we performed double immunostaining with antibodies to DCLK1 and to Gα-gustducin, another tuft cell marker. The results showed that some of these spindle-shaped DCLK1 positive cells indeed also express Gα-gustducin (see the newly added panels in Figure 1-figure supplement 1), indicating that they are most likely the chemically induced ectopic tuft cells. We also agree with the reviewer that it would be important to further investigate the possible roles of these cells during the stages of the chemically induced injury, inflammation resolution and functional recovery.

      (3) Calcium responses in isolated post-flu tuft cells are interesting but difficult to interpret as presented. Can higher-power images be shown? Also, no statistical analysis is presented to provide any confidence in that data.

      Thank the reviewer for the suggestions. As found in taste buds, only a subset of these ectopic tuft cells expresses Tas2rs, and each of these cells may express a few of the 35 murine Tas2rs. Thus, a particular bitter tasting compound can activate only few tuft cells and we had to use low-magnification to include more responsive cells in a field under the imaging microscope. We agree with the reviewer that it would be an interesting idea to statistically correlate the response profile to bitter substances with the cell’s Tas2r expression pattern, which we have done with sperm cells before (Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040). However, the main focus of this work is on the effect of Gng13-cKO in a subset of these ectopic tuft cells on the recovery. We plan to investigate these interesting cells in more details in the future.

      (4) I am unaware of Sytox being a specific dye for pyroptotic cells. Can the authors please provide a reference or otherwise justify this?

      Sytox is a dye to stain dead cells, which has been used previously in the studies on gasdermin-mediated lytic cell death (Xi et al., Up-regulation of gasdermin C in mouse small intestine is associated with lytic cell death in enterocytes in worm-induced type 2 immunity. PNAS 2021 118(30) e2026307118 https://doi.org/10.1073/pnas.2026307118). In our work we used the dye for the same assay.

      (5) The authors perform qPCR for various taste receptor genes pre- and post-flu, but do not show that these genes are specifically induced in tuft cells. Since single-cell data and bulk RNA-Seq are available from Barr et al., the authors should validate the expression of these Tas2r genes specifically in post-flu tuft cells.

      Thank the reviewer for the suggestion. Yes, we have performed analysis of the single-cell RNAseq dataset (GSE197163, Barr et al. 2022) and found that among 613 Trpm5-GFP+ tuft cells, Tas2r108 was expressed in the greatest number of cells, i.e., 67 cells, followed by Tas2r105, Tas2R138, Tas2r137, Tas2r118 and Tas2r102, which were detected in 11, 10, 10, 5 and 4 cells, respectively (see the newly added Figure 2-figure supplement 1). This order of expressing cell numbers is very much in agreement with that of the relative Tas2r expression levels obtained with the qPCR experiment (Figure 2A), indicating the expression of these Tas2rs likely in the ectopic tuft cells. We will further validate the data by analyzing the bulk RNA-Seq dataset when it is accessible to us.

      (6) Some general editing of language throughout would be helpful to increase readability.

      Thanks for pointing out. We have carefully checked the manuscripts, corrected some typos and revised several sentences to increase its readability.

      (7) For the fibrosis analysis, trichrome staining is very heterogenous, which is reflected by the large error bars in Fig. 8B. A more quantitative, "whole lung" analysis such as hydroxyproline content or western blotting for Col1a1 would be ideal.

      The approach of Masson’s trichrome staining along with qRT-PCR assays on the fibrotic gene expression has been used previously to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). We agree with the reviewer that there are large error bars in Fig. 8B, and hydroxyproline content assay or western blotting for Col1a1 would be ideal. But our qRT-PCR was performed on the RNA samples extracted from the “whole lungs”, and its data are also able to reflect the extent of fibrosis of the lungs.

      (8) The authors claim that only a subset of tuft cells express Gng13, but this is supported only by a single IF image in Fig. 3 supplement 1G. The authors could download the single-cell dataset from Barr et al. to confirm the heterogeneity of Gng13 expression and get a better sense of the fraction of total ectopic tuft cells that express this, as it is a critical point in their model.

      Thank the reviewer for the suggestion. Yes, we have downloaded and reanalyzed the single-cell RNAseq dataset (GSE197163), and found that out of 613 Trpm5-GFP+ tuft cells, 350 or 57% of these cells expressed Gng13 (Figure 3-figure supplement 2I). This result, together with our immunohistochemical data (Figure 3-figure supplement 2G and H) indicates that Gγ13 is expressed in a subset of these ectopic tuft cells. More comprehensive studies are needed to characterize these tuft cell subtypes and elucidate subtype-selective functions.

      Reply to the recommendations of Reviewer #2:

      The study needs more rigorous examinations of the phenotypes. For example, quantification of the injury area in Fig3C is problematic. Similarly, fibrotic phenotype and quantification in Fig 8C also have problems. This study heavily used qRT-PCR analysis to quantitate the level change of bitter/other receptors in a minor population of tuft cells which are also minor in a whole lung. Given the limited number of cells, it is difficult to appreciate that qRT-PCR can pick up the difference. In addition, how would the findings in this study reconcile with the finding by Huang (PMID: 36129169) where pou2f3 null mutants (without tuft cells) were used? Huang et al. did not observe more severe phenotypes in the mice without tuft cells than controls.

      Thank the reviewer for the recommendations. Regarding Fig 3C, please see the reply below: revisions for clarity point #2.

      Fig 8 B and C used Masson’s trichrome staining to quantitatively analyze fibrosis, which has been used by other groups as well (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR data on the fibrotic gene expression (Figure 8A) further support the Masson’s trichrome staining results.

      We realized that tuft cells make up only a minor population in the lungs. So, we performed qRT-PCR assays on the RNA samples isolated from mostly the injured tissues along with the corresponding tissues from the uninjured lungs as control. To validate our qRT-PCR data, we reanalyzed the previously published single ectopic tuft cell RNAseq dataset (GSE197163), and found that the most abundantly expressed Tas2r108 determined by qRT-PCR was also expressed in the greatest number of tuft cells, and the order of expression levels of other Tas2rs are also well in agreement between the qRT-PCR and single-cell RNAseq data (Figure 2A, Figure 2-figure supplement 1), cross-validating the data obtained by these two very different approaches.

      We have carefully studied the finding by Huang (PMID: 36129169). Our data suggest that there are subtypes of the ectopic tuft cells, some of which contribute to the inflammation resolution while others play a proinflammatory role. Indeed, the reanalysis of the aforementioned single tuft cell RNAseq dataset found that about 57% Trpm5-GFP+ ectopic tuft cells expressed Gng13, and some of which expressed Alox5, a key enzyme to the biosynthesis of pro-resolving mediators. Thus, in the Pou2f3-knockout mice, both pro- and anti-inflammatory tuft cells are ablated, it would be hard to observe any significant phenotypes. When the function of a subset of Gγ13-expressing tuft cells is disrupted, the anti-inflammatory role from these cells is eliminated, resulting severer phenotypes. More studies are needed to further understand the subtype-specific functions of these fascinating tuft cells.

      Do Gγ13 null mutants show similar phenotypes in bleomycin injury model?

      Bleomycin and other chemicals-induced injury models indeed engender much fewer ectopic pulmonary tuft cells. Thus, it is more difficult to test the effect of Gng13 mutation due to the small number of the Gng13-expressing tuft cells in either WT or mutant lungs.

      What is the cell fate of lineage labeled tuft cells in the lungs of Chat-Cre:Ai9:Gng13flox/flox mice following viral infection at different times examined? The numbers were decreased at different time points post-injury based on the data. Did these cells undergo apoptosis? It is an excellent idea to look into the cell fate of ChAT-Cre:Ai9:Gng13flox/flox. We believe that these cells would have a similar fate to other ectopic tuft cells, probably undergoing apoptosis. But our data suggest that Gng13 mutation suppresses the increase the ectopic tuft cells, or the increase of a particular subtype of these tuft cells. Further studies are needed to elucidate the molecular mechanisms of the Gγ13-mediated signal transduction pathways regulating the proliferation of a subset of ectopic tuft cells.

      Here are the revisions for clarity and coherence to the figures:

      (1) Fig 2: For the functional assessment, using tracheal tuft cells from the same ChAT-Cre:Ai9 mice would be a suitable positive control in the calcium response traces experiment. These specific cells could also serve as a control in Fig2a.

      We would agree with the reviewer that tracheal tuft cells from the same ChAT-Cre: Ai9 mice would be an ideal positive control in the calcium response experiment as well as in the qRT-PCR assay. But we have established reliable methods to calcium image primary cells expressing taste receptors and quantify their RNA expression levels, which have been used in our previous publications, e.g., (1) Functional characterization of bitter taste receptors expressed in mammalian testis. Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040; (2) Infection by the parasitic helminth Trichinella spiralis activates a Tas2r-mediated signaling pathway in intestinal tuft cells. PNAS 2019, www.pnas.org/cgi/doi/10.1073/pnas.1812901116. We thank the reviewer for the excellent suggestion.

      (2) Fig 3C: It is not clear whether the depicted areas really represent the injured area. To provide a more comprehensive view, the authors should also provide histological analysis and quantification of the injured lung. A 3D representation of the injury area would offer a more accurate presentation.

      Thank the reviewer for the point. The depicted areas in Fig 3C are indeed the injured surface areas of the lungs. Following the reviewer’s suggestion, we carried out the histological analysis to determine the injured tissue volumes of the lungs. We fixed the lungs, and sliced them into 12 μm-thick sections, which were imaged under a microscope. The injured areas in a section were identified and quantified using the ImageJ software, and then the injured volume for this section was obtained by multiplying the area by the thickness of the section, i.e., 12 μm. Statistical analyses indicate that the injured volume of the Gng13-cKO lungs is significantly more than those of WT or Trpm5-KO mice, which has been included in Figure 3-figure supplement 1, and is in agreement with the data of the injured surface areas (Fig 3C).

      (3) Fig 3 G/I/K/M: There seems to be an inconsistency in the time points. There's no indication for 14 dpi, yet two for 25 dpi. Additionally, a color legend for each sample would be helpful.

      Thank the reviewer for pointing out. There were two typos, which have been corrected. Yes, the time points should be 14 dpi, 20 dpi, 25 dpi and 50 dpi. And a color legend has been added as well.

      (4) Fig 4A: Using CD64 co-stained with Krt5 might better highlight the immune cells in the damaged region. Additionally, could you clarify the choice of the neutrophil marker CD64 over CD45 for staining the injured lung?

      We agree with the reviewer that Krt5 antibody staining can help define the damaged region. We sectioned the lung tissues with a special attention to the damaged areas, but we found that the adjacent healthy areas also had extra immune cells. Thus, we counted in all these CD64+ cells in both the damaged as well as the surrounding, seemingly healthy areas. We used CD64 instead of CD45 to label these altered immune cells because we found that CD64 can better label the differential immune cells between WT and Gng13-cKO mice following H1N1 infection. Furthermore, CD64-labeled cells could be readily related to the Gsdmd/Gsdme-expressing F4/80-labeled immune cells shown in Figure 5 and its supplemental figures.

      (5) Fig 5 and Supplemental Fig 5: It appears that the F4/80 staining exhibits notable background staining.

      Yes, there is some background staining. The antibody was the best we could find, but its quality could be further improved. On the other hand, we thought that there were some cellular debris that might be stained positive by that antibody. At a higher magnification, however, we could still identify individual cells co-expressing IL-1β.

      (6) Fig 8C: The depicted area does not seem to adequately represent the fibrosis in the injured lung.

      Masson’s trichrome staining has been previously used to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR assays on the fibrotic gene expression (Figure 8A) were performed on the RNA samples extracted from the whole lungs, and the resultant data are able to reflect the extent of fibrosis of the lungs, although we also agree with the reviewer that additional data would make the conclusion more convincing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We express our sincere appreciation for your insightful comments and constructive suggestions. It is with great pleasure that we submit the revised version of our manuscript. Over the past months, we have meticulously considered all the invaluable feedback provided by the three anonymous reviewers, and endeavored to incorporate significant revisions accordingly. Furthermore, we have meticulously rephrased the results section in accordance with your guidance, aiming to bolster the rigor of our manuscript. The specific changes implemented in the revised manuscript are outlined below:

      - Revised the title of the manuscript.

      - Revised the description of early mitotic and meiotic chromosome structure in the scc3 mutant (Lines 167-274).

      - Added the BiFC results illustrating the interaction between SCC3 and other cohesin proteins in Figure S10.

      - Enhanced the detail in the description of figure legends, particularly for Figures 2 and 4.

      - Refined and rephrased the language of the manuscript.

      We hope these positive revisions have substantially strengthened the manuscript. Once again, we extend our heartfelt gratitude for your invaluable input.

      eLife assessment

      This important study elucidates the function of the cohesin subunit SCC3 in impeding DNA repair between inter-sister chromatids in rice. The observation of sterility in the SCC3 weak mutant prompted an investigation of abnormal chromosome behavior during anaphase I through karyotype analysis. While the evidence presented is largely solid, the strength of support can be substantially improved in some aspects, leaving room for further investigation. This research contributes to our understanding of meiosis in rice and attracts cell biologists, reproductive biologists, and plant geneticists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript describes the identification and characterization of rice SCC3, including the generation and characterization of plants containing apparently lethal null mutations in SCC3 as well as mutant plants containing a c-terminal frame-shift mutation. The weak scc3 mutants showed both vegetative and reproductive defects. Specifically, mitotic chromosomes appeared to partially separate during prometaphase, while meiotic chromosomes were diffuse during early meiosis and showed alterations in sister chromatid cohesion, homologous chromosome pairing, and recombination. The authors suggest that SCC3 acts as a cohesin subunit in mitosis and meiosis, but also plays more functions other than just cohesion.

      Reviewer #2 (Public Review):

      This manuscript shows detailed evidence of the role of cohesin regulators in rice meiosis and mitosis.

      Reviewer #3 (Public Review):

      Prior research on SCC3, a cohesin subunit protein, in yeast and Arabidopsis has underscored its vital role in cell division. This study investigated into the specific functions of SCC3 in rice mitosis and meiosis. In a weakened SCC3 mutant, sister chromatids separating was observed in anaphase I, resulting in 24 univalents and subsequent sterility. The authors meticulously documented SCC3's loading and degradation dynamics on chromosomes, noting its impact on DNA replication. Despite the loss of homologous chromosome pairing and synapsis in the mutant, chromosomes retained double-strand breaks without fragmenting. Consequently, the authors inferred that in the scc3 mutant, DNA repair more frequently relies on sister chromatids as templates compared to the wild type.

      We extend our sincere gratitude to the Editors and the Reviewers for their highly constructive and insightful suggestions. We deeply appreciate receiving both positive feedback and constructive criticism on our manuscript. In light of the reviewers’ comments, we have diligently undertaken substantial revisions to improve the manuscript. The revised version comprehensively addresses all the points raised by the reviewers.

      Below, we provide a detailed point-by-point response to the reviewers’ comments:

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 170- looking at pollen formation does not specifically evaluate whether SCC3 is involved in meiosis.

      Thank you very much for this advice. We totally agree with your point of view that pollen formation defects only indicate the problem of gametogenesis. We are sorry for not accurately describing this sentence. It has been revised in the manuscript (Lines 167-176).

      (2) Lines 203-205- this seems more like discussion and is pure speculation. Another possibility described above is that the truncated SCC3 protein is partially functional and what they see is due to this partial functionality. Have the authors considered the possibility that a partially functional version of SCC3 is produced that alters its function or the function of the cohesin complex? How much of the protein epitope remains in the truncated protein?

      We are so grateful for the insightful suggestions provided. We concur with the proposition that a partially functional SCC3 may indeed be synthesized, contributing to the survivability of the mutant. Notably, the truncated version of the protein retains approximately 60% to 70% of the epitope, which ostensibly maintains a residual functionality within the weak scc3 mutant. In this manuscript, the loss of C-terminal 910-1116 aa of SCC3 contains a special protein epitope and a certain protein secondary structure, which may alter the protein’s folding and its subsequent roles within the cohesin complex.

      In this study, we encountered challenges in generating null alleles of the scc3 mutants in rice utilizing the CRISPR-Cas9 system. Consequently, it is plausible that the scc3-1 and scc3-2 variants represent null alleles of SCC3, resulting in embryonic lethality. We posit that the identification of weak alleles is paramount to facilitating the survival of the organism. Thus, selecting some weak mutants, particularly those exhibiting the most pronounced phenotype, is advantageous for conducting further research. Our findings indicate that the diminished scc3 mutant lacks only a segment of the C-terminal, yet this deficiency is adequate to ensure the plant's survival while significantly impeding the meiotic process. We cannot dismiss the likelihood that these observed defects are attributable to the unique truncated proteins. We extend our sincerest thanks once again.

      (3) Lines 212- I question whether what the authors see in Figure 2 is chromosome fragmentation. It could just as well be alterations in chromosome structure. Likewise, the authors provide little to no evidence that the mutation affects the replication process. Rather, the presence of replicated chromosomes later in mitosis and meiosis would argue that replication is not disrupted.

      We express our gratitude to the reviewer for highlighting this critical inquiry. Contrary to the scenario of chromosome fragmentation, as you astutely observed, the preservation of normal sister chromatids during prometaphase indicates that the replication process remains uninterrupted. In alignment with your insights, our study embarked on an extensive series of full-length fluorescence in situ hybridization (FISH) experiments to elucidate the underlying mechanisms contributing to the observed increase in the distance between sister chromatids, particularly during interphase. The preponderance of our findings corroborates the hypothesis that the chromosomes exhibit alterations in structure, as depicted in Figure 2A. Intriguingly, our data suggest that cohesin, upon interaction with other chromatin-bound proteins, may facilitate loop extrusion, anchoring itself in a manner that potentially alters chromosomal architecture. These alterations in chromosome structure and the subsequent defects in genome folding and cohesion establishment, particularly rely on SCC3. In response to your valuable suggestions, we have meticulously revised the relevant sections of our manuscript. We extend our sincere thanks for your insightful comments.

      (4) Line 230- what does the sentence SCC3 may enhance the interaction with DNA mean, the interaction of the cohesin complex?

      We are sorry for the ambiguity in our initial description and wish to clarify that SCC3 indeed plays a pivotal role in augmenting the interaction between the cohesin complex and DNA. Our observations revealed an upsurge in the signal intensity of SCC3 as cells transition from interphase to prophase, as depicted in Figure 2B. This enhancement correlates with the observed defects in scc3 mutants during prophase, suggesting that SCC3’s functional significance is particularly pronounced at this stage of the cell cycle. We have revised our manuscript to reflect these insights more accurately, in accordance with your valuable suggestions. We express our sincere gratitude for your guidance.

      (5) Oddly, and unexplainably the authors present data indicating that SCC3 interacts with RAD21.1, but not SMC1, SMC3, or REC8. The fact that the authors report that SCC3 only interacts with RAD21.1 but no other cohesin proteins is quite hard to explain.

      As argued in the point above, the available data do not provide compelling evidence supporting the interaction between SCC3 and other cohesin proteins. We have repeated yeast two-hybrid (Y2H) experiments yielding consistent outcomes, which also surprised us initially. In the revised manuscript, we further added the bimolecular fluorescence complementation (BiFC) results between SCC3 and other cohesin proteins in rice protoplast (Figure S10). These supplementary data affirm that SCC3 predominantly interacts with RAD21.1, excluding interactions with other cohesin proteins. While the absence of such interactions is perplexing, our investigations have failed to detect any binding between SCC3 and other cohesin proteins.

      A weak interaction between SCC3 and REC8 has been reported in Arabidopsis (Kuttig et al. bioRxiv https://doi.org/10.1101/2022.06.20.496767). We speculate that either these proteins do not interact or the yeast-hybrid assays may be inadequate for detecting their interaction, as several factors can impede interaction in a heterologous system. In Figure 7, we could only detect the interaction between SCC3 and RAD21.1 in both Y2H and BiFC experiments. This suggests potential alterations in protein folding or conformation, or the involvement of additional regulatory factors modulating the interaction between SCC3 and other cohesin proteins. Notably, given RAD21.1’s pivotal role as a core component in the cohesin complex, our supplementary findings demonstrate the interactions between SMC1/3 and RAD21.1 (data not shown). Consequently, our current data propose a model wherein RAD21.1 and SMC1/3 form a circular structure, with SCC3 positioned on the outer periphery of the ring complex, associating specifically with RAD21.1 (Figure 8A).

      Reviewer #2:

      The authors did not consider creating heterozygous mutants for the replication fork. Moderate English language editing may be required.

      We extend our gratitude to the reviewer for their valuable suggestions. Initially, we did not explore the potential relationship between SCC3 and the replication fork. Cohesin, as we understand, becomes associated with DNA prior to DNA replication. The phenomenon of sister chromatid co-entrapment arises as replication forks traverse through cohesin rings, a process intricately linked to DNA replication dynamics. In this study, we exclusively observed aberrant chromosome structures in the scc3 mutant during interphase (Figure 2). We conjecture that these anomalies may stem from alterations in chromosome structure, such as genome folding and loop extrusion, rather than being directly attributable to the DNA replication fork. However, the precise nature of these chromosome structural aberrations during interphase in the scc3 mutant remains elusive, necessitating further comprehensive investigation in future studies. We have refined the language of our manuscript in accordance with the reviewer’s suggestions. Once again, we express our sincere appreciation for the invaluable suggestions provided.

      Reviewer #3:

      While the paper's conclusions are generally well-supported, further substantiation is needed for the claim that SCC3 inhibits template choice for sister chromatids. To bolster this conclusion, I recommend that the authors perform whole-genome sequencing on parental and F1 individuals from two rice variants, subsequently calculating the allele frequencies at heterozygous sites in the F1 individuals. If SCC3 indeed inhibits inter-sister chromatid repair in the wild type, we would anticipate a higher frequency of inter-homologous chromosome repair (i.e., gene conversion). This should be manifested as a bias away from the Mendelian inheritance ratio (50:50) in the offspring of the wild type compared to the offspring of the scc3+/- mutant.

      We express our sincere appreciation for your insightful suggestions. It is really a good suggestion. We have arranged to do this experiment. As it takes long time to prepare plant materials and sequence analysis, we hope the ongoing sequencing work will get some important information supporting those hypotheses. As we have not obtained the direct evidence that SCC3 involved in sister chromatid repair, we changed the title as “SCC3 is an axial element essential for homologous chromosome pairing and synapsis”. Once again, we really extend our gratitude for your invaluable suggestions.

      A point that warrants consideration is the placement of the protein interaction experiments involving SCC3 within the paper. It is presented relatively late in the manuscript. If the authors possess information regarding the interaction between RAD21.1 and SCC3 and how it relates to the functional study of RAD21.1, it could contribute to a more comprehensive analysis. However, if this information is unrelated to the current study, it might be advisable to omit it, as it appears to diverge from the main focus of this work.

      We express our sincere gratitude for your invaluable suggestions. It has been documented in yeast that the interaction between SCC3 and SCC1 is indispensable for the efficient loading of cohesin. In our study, we endeavored to elucidate the intricate relationships among various cohesin subunits. Through our investigations, we have discerned that RAD21.1 serves as a pivotal core subunit within the cohesin complex, facilitating interactions with both SMC1/3 and SCC3 (data not shown). Additionally, our findings indicate that the interaction between RAD21.1 and SCC3 is imperative for maintaining the stability of the cohesin ring and its association with DNA (data not shown). Consequently, the interaction between these two proteins assumes paramount importance for our subsequent analyses. This study holds significant promise for future investigations.

      It's worth noting that while the title of the study claims that "SCC3 inhibits inter-sister chromatids repair during rice meiosis," the last sentence of the abstract weakens this conclusion by using the word "seems." A study's title should ideally reflect the most definitive and conclusive findings.

      We sincerely appreciate your valuable suggestions. In response, we have revised the description in our manuscript to enhance its rigor.

      In Figure 8C, it appears that cohesin is depicted between two DNA strands.

      Figure 8C illustrates the process of sister chromatid repair during meiosis in the scc3 mutant. Two gray lines and two blue lines represent the four sister chromatids of two homologous chromosomes, respectively. In the wild type, cohesin plays a crucial role in tethering together the two sister chromatids. As per your reminder, cohesin should indeed encircle the two sister chromatids, as depicted in Figure 8B. Following a thorough evaluation and to mitigate any potential confusion, we have deleted Figure 8C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate your comments and suggestions on our manuscript.

      In particular, we have measured the affinity between the middle tail domain of myosin-5a (Myo5a-MTD) and the actin-binding domain of melanophilin (Mlph-ABD) using microscale thermophoresis, and obtained the Kd of ~0.56 uM, which is similar to the Kd of the globular tail domain of myosin-5a (Myo5a-GTD) to the GTD-binding motif of melanophilin (Mlph-GTBM). Moreover, we have performed Western blot of the lysate of transfected cells, showing that the proteins of the dominant negative construct and the negative control were expressed at similar lever without noticeable degradation.

      We appreciate the editors’ and reviewers’ comment on how melanophilin might be regulated in binding to the exon-G of myosin-5 and to actin filaments. Phosphorylation of melanophilin by protein kinase A is one possible mechanism. We will investigate this issues in our future study.

      We also took this opportunity to correct several minor errors in the manuscript. Textual alterations can be viewed in the “tracked change” version of the manuscript. Below is the comments from the editors and the two reviewers together with our point-by-point responses.

      eLife assessment

      This study represents a useful description of a third interaction site between melanophilin and myosin-5a which is important in regulating the distribution of pigment granules in melanocytes. While much of the data forms a solid case for this interaction, the inclusion of important controls for the cellular studies and measurement of interaction affinities would have been helpful.

      Public Reviews:

      Reviewer #1 (Public Review):

      Interactions known to be important for melanosome transport include exon F and the globular tail domain (GTD) of MyoVa with Mlph. Motivated by a discrepancy between in vitro and cell culture results regarding necessary interactions for MyoVa to be recruited to the melanosome, the authors used a series of pull-down and pelleting assays experiments to identify an additional interaction that occurs between exon G of MyoVa and Mlph. This interaction is independent of and synergistic with the interaction of Mlph with exon F. However, the interaction of the actin-binding domain of Mlph can occur either with exon G or with the actin filament, but not both simultaneously. These data lead to a modified recruitment model where both exon F and exon G enhance the binding of Mlph to auto-inhibited MyoVa, and then via an unidentified switch (PKA?) the actin-binding domain of Mlph dissociates from MyoVa and interacts with the actin filament to enhance MyoVa processivity.

      The only weakness noted is that the authors could have had a more complete story if they pursued whether PKA phosphorylation/dephosphorylation of Mlph is indeed the switch for the actin-binding domain of Mlph to interact with exon G versus the actin filament.

      We thank Reviewer #1 for careful reading of the manuscript and appreciation of the study. We agree with the Reviewer that it is important to understand how the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. We would like to pursue this direction in our future research.

      Reviewer #2 (Public Review):

      The authors identify a third component in the interaction between myosin Va and melanophilin- an interaction between a 32-residue sequence encoded by exon-g in myosin Va and melanophilin's actin-binding domain. This interaction has implications for how melanosome motility may be regulated.

      While this work is largely well done and certainly publishable following needed revisions (e.g. some affinity measurements, necessary controls for the dominant negative experiments), I believe that additional work would be required to make a more compelling case. First, the study provides just one more piece to a well-developed story (the role of exon-F and the GTD in myosin Va: melanophilin (Mlph) interaction), much of which was published 20 years ago by several labs. Second, the study does not demonstrate a physiological significance for their findings other than that exon-G plays an auxiliary role in the binding of myosin Va to Mlph. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G. Is it a PTM or local actin concentration? It is unlikely to be alternative splicing as exon-G is present in all spliced isoforms of myosin Va. And what changes re melanosome dynamics in cells between these two alternatives? Similarly, the paper does not provide any in vitro evidence that binding to exon-G instead of actin effects the processivity of a Rab27a/Myosin Va/Mlph transport complex. For example, if the ABD sticks to exon-G instead of actin, does that block Mlph's ability to promote processivity through its interaction with the actin filament during transport? In summary, given that the authors did not directly test their model either in vitro or in cells, I do not think this story represent a significant conceptual advance.

      We thank Reviewer #2 for careful reading of the manuscript and the suggestions of improving the manuscript. As suggested by the reviewer, we have measured the affinity between the middle tail domain of Myo5a (Myo5a-MTD) and Mlph-ABD (Kd ~0.562 uM), which is similar to that between the globular tail domain of Myo5a (Myo5a-GTD) and the GTBM of Mlph. In addition, we have performed additional experiments showing the integrity and the expression level of the dominant negative constructs in the transfected cells.

      We believe more extensive experiments are required to address other questions raised by the reviewer. For example, what dictates the choice between Mlph's actin binding domain (ABD) binding to actin or to exon-G is an open question. As we proposed, phosphorylation by protein kinase A is only one possible mechanism. We would like to pursue them in our future research.

      Recommendations for the authors:

      The reviewing editor feels strongly that addressing some of the points raised by the reviewers would make this a more compelling manuscript. In particular, a measurement of the affinity of the relevant fragments from melanophilin and myosin-5a would indicate that the interaction might be physiologically relevant. Concerning the dominant negative experiments, the lack of effect of an expressed fragment could be that the expressed fragments were simply degraded or expressed at too low of a level to be competing. The reviewer gives guidelines on how to address this. Reviewer #2 made a point that it would be compelling if the effect of phosphorylation as suggested in the model was tested, but we all agree that this could well be the subject of a later study. In addition, the authors make a very interesting proposal for how protein kinase A could be involved in this regulation as has been suggested previously. Perhaps the use of phosphomimetic mutations could give some insight into this. Such experiments, if consistent with the proposed model would certainly raise the impact of this study. Finally, a very clear periodicity in hydrophobic amino acids is apparent in the interacting sequences of both Myo5 (yrisLykrMidLmeqLekqdktVrkLkkqLkvFakkIgeLevgqmen) and Mlph (tdeeLseMedrVamtAseVqqAeseIsdIesrIaaLra). This is strongly suggesting a leucine-zipper-like coiled coil, rather than an interaction mediated solely by charge. Recent softwares (and easily accessible too) like AlphaFold multimer might yield important structural insight into the binding configuration and might help rationalize the effect of the mutations herein.

      We thank the editors and the reviewers for their suggestions of improving the manuscript. We have performed the several essential experiments to address the concerns raised by the reviewers.

      (1) Regarding the affinity of the relevant fragments from melanophilin and myosin-5a. We have measured the affinity between Mlph-ABD and Myo5a-MTD using MST (Kd ~562 nM) (see revised Figure 3A).

      (2) Regarding the concerns on the dominant negative experiments. We have examined the molecular sizes and expression levels of  Mlph or Myo5a constructs by Western blots. First, we show that all constructs have correct molecular size in transfected cells (see revised Figure 6C and 7D), indicating that the inability of Myo5a or Mlph truncations to generate dilute-like phenotypes was not due to the intracellular degradation of the EGFP fusion protein. Second, by correcting for the percentage of transfected cells, we show that the overall expression levels of the wild-type construct and the mutants are roughly equal. Third, we categorized the expression levels into high and low, and calculated percentage of the DN phenotype in high and low expression levels. The results are consistent with the percentage of DN phenotype in total EGFP fusion protein cells.

      (3) Regarding the suggestion to investigate the effect of phosphorylation by protein kinase A on Mlph-ABD’s interaction with Myo5a and actin filament. We understand that it is important to elucidate the mechanism by which the actin-binding domain of Mlph switch its interaction with the exon-G of Myo5a and actin filament. However, as we proposed, phosphorylation by protein kinase A is one possible mechanism, and more extensive experiments are required to address this question. Therefore, we would like to pursue it in our future research.

      (4) Regarding the suggestion to predict the interaction between the exon-G of myosin-5a and Mlph-ABD using AlphaFold. We have used AlphaFold multimer to predict the Myo5a-MTD/Mlph-ABD interaction. Remarkably, the AlphaFold predicted that the binding of Myo5a-MTD with Mlph-ABD is mediated by an antiparallel coiled-coil formed by Myo5a (1430-1467) and Mlph (450-481), just as predicted by the editors. This prediction is also consistent with our finding that the exon-G of Myo5a interacts with Mlph-ABD. However, the predicted model cannot explain our mutagenesis results. We will pursue this point in the future research. Nevertheless, we are grateful to the editors for bringing this idea to our attention, because it will help us to design experiments to investigate the nature of Myo5a-exon-G/Mlph-ABD interaction.

      Reviewer #1 (Recommendations For The Authors):

      Specific minor comments

      Q1: In figs 6-7 an overlay between DAPI and EGFP would be helpful for the reader to see perinuclear distribution.

      As suggested, we have added the merged images of DAPI and EGFP in the revised Figure 6 and 7.

      Q2: The delta symbol in the pdf text was corrupted.

      The corrupted delta symbol has been fixed in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Q1: Please explain in detail early in the text what exon-G is - length, position in the tail, and evidence that it is a coiled coil (CC). Of note, is it only long enough for about 4 heptad repeats? Has it been shown biochemically to form a CC? Is the CC irreversible? What would be the consequence of removing the exon-G CC on the ability of surrounding regions to bind Mlph (exon-F and the GTD)?

      We thank the reviewer for this suggestion. In the revision, we added a new paragraph (the first paragraph in the results section) and revised Figure 1A to introduce the middle tail domain and alternatively spliced exons of Myo5a.

      Exon-G is 32 amino acids in length, located at the C-terminal region of the middle tail domain, immediately before the globular tail domain. Exon-G region was predicted to form a short coiled-coil by using on-line tools (such as paircoil), and this prediction has not been tested biochemically. Moreover, we do not know whether the exon-G coiled-coil is reversible or not.

      We have not examined the effect of removing the whole exon-G on the interaction between the GTD and Mlph-GTBM. The exon-G (residues 1436-1467) and the GTD core (residues 1498-1877) are separated by a long loop of 31 residues. We therefore expect that the removing the exon-G will not affect the GTD/Mlph-GTBM interaction.

      Physically, exon-F is immediately followed by exon-G, and those two regions might interfere with each other. In our preliminary study, we found that removing the whole exon-G abolished the interaction between exon-F and Mlph-EFBD. On the other hand, removing the C-terminal half (residues 1454-1467) of exon-G had little effect the interaction between exon-F and Mlph-EFBD (see Figure 2C). In this work, we intentionally selected the later construct for functional analysis of the exon-G/Mlph-ABD interaction, because removing the C-terminal half of exon-G abolishes the interaction with Mlph-ABD, but does not affect the exon-F/Mlph-EFBD interaction.

      Q2: Figures 1-3. While the pulldown experiments demonstrating an interaction between Mlph-ABD residues 446-571 and Myo5a-MTD are a good start, one would like to see affinity measurements to gauge the likelihood that this interaction is physiologically relevant. The same goes for the pulldown experiments demonstrating an interaction between (i) the C-terminal half of exon-G (residues 1453-1467) and the Mlph-ABD, (ii) between residues 1411-1467 (a short peptide containing exon-F and exon-G) and the Mlph-ABD, and (iii) between residues 1436-1467 (a short peptide containing exon-G) and the Mlph-ABD. This would also apply to the pulldowns in 3C-3E where versions of the proteins with charge residue changes were tested.

      We agree the reviewer’s opinion that determination of the affinities between Mlph-ABD and Myo5a-MTD and their variants will be helpful in understanding the physiological relevance of Exon-G/Mlph-ABD interaction. However, the extensive experiments suggested by the reviewer require many high quality, purified proteins, which are not trivial.

      Nevertheless, we think it is important to know the affinity between Myo5a-MTD and Mlph-ABD (both wild-type), as this parameter can be used for the comparison of the three interactions between Myo5a and Mlph. Therefore, we have obtained the affinity between Myo5a-MTD and Mlph-ABD using microscale thermophoresis (MST). The dissociation constant (Kd) of Myo5a-MTD to Mlph-ABD is 0.562±0.169 uM, which is similar to that between Myo5a-GTD and Mlph-GTBM (~1 uM) (Geething & Spudich (2007) JBC 282:21518). Consistent with GST pulldown results, MST shows that deletion of C-terminal half of exon-G (1453-1467) greatly decreases the MST signals (see revised Figure 3A).

      Q3: While the domain negative (DN) approach to testing functional significance is OK, rescuing dilute/myosin Va null melanocytes with full-length myosin Va containing the various deletions would have been more convincing. Also, the authors must show (i) that the DN constructs are the correct size in transfected cells (i.e. are not degraded), and (ii) that they are expressed at roughly equal levels (either by doing Westerns and correcting for the percent of transfected cells, or by measuring total cellular fluorescence in transfected cells). Without this information, it remains possible that constructs not exhibiting a DN effect are simply degraded or poorly expressed. This applies to all the DN data in Figures 6 and 7.

      We agree with the reviewer that Myo5a null melanocytes is ideal for investigating exon G function. Unfortunately, we do not have Myo5a null melanocytes derived from dilute mice.

      To confirm the integrity of the overexpressed proteins in the transfected cells, we performed Western blot of those proteins, including  EGFP-Mlph-RBD (wild-type and two mutants) and Myo5a-Tail (wild-type and G mutant), in the lysate of the transfected cells. Western blots show that all those proteins have correct molecular masses, indicating no degradation of those overexpressed proteins (see revised Figure 6C and 7C). Moreover, by correcting for the percentage of transfected cells, we show that the overall expression levels in each transfected cell of the wild-type construct and the mutants are roughly equal. This information is included in the revised manuscript (Line 222-225; 237-241).

      Q4: The authors scored the DN phenotype as yes/no but it mostly likely varies depending on the degree of over-expression. Showing that the degree of melanosome centralization scales with the degree of overexpression, and that the correlation between expression level and phenotype varies depending on the construct would strengthen the results.

      We agree with the reviewer’s prediction that the degree of DN phenotype should depend on the of over-expression level. We analyzed the EGFP signals of transfected cells and found very few cells with medium expression level. Therefore, we simply categorized the expression levels into high and low, and calculated the DN phenotype in each categories as shown in the table below. These results are consistent with the expectation that the degree of DN phenotype depends on the over-expression level of the transfected constructs.

      Author response table 1.

      Percentage of the EGFP-expressing cells with perinuclear aggregation of melanosomes

      Q5: The conclusion from the data in Figure 8A- "the presence of both exon-F and exon-G is insufficient for binding to the Mlph occupied by Myo5a, but sufficient for binding to the unoccupied Mlph"- should be verified by also doing the experiment in myosin Va knockdown cells.

      We agree. Unfortunately, our RNAi knockdown of Myo5a in melanocytes by RNAi is not ideal and we do not have Myo5a knockout melanocytes. We will pursue this point in the future.

      Q6: Line 213 "three Mlph-binding regions, i.e., exon-F, exon-F, and GTD (Figure 7A)" has a typo.

      This typo has been corrected.

      Q7: The authors should provide high mag insets for the images in Figure 8.

      As suggested, we have revised Figure 8 by including high mag insets for the images.

    1. Author response

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.

      Weaknesses:

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification?

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair.

      Reviewer #2 (Public Review):

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2-overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs.

      We thank the reviewer for the positive evaluation of our work.

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization.

      We will clarify the experiments and characterization performed with lyophilized tissues versus those performed with decellularized ones. We will also provide evidence of DNA removal in our decellularized ECMs.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intra-experimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo.

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We will temper our claims in the discussion and mention the need to regularly re-characterize cell lines properties upon passages.

      In our study, our grafts have been generated from various batches and tested in more than one experimental repeat. This will be further described in the revised version of our manuscript. We will also implement data on the size variability of generated tissues.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189).

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue.

      We thank the Reviewer for the constructive suggestions. We will revise language accordingly throughout the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene-edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      -The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      -If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair.

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.

      Weaknesses:

      -The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. However, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer.

      -In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix.

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively.

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). We previously demonstrated that the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation. Since we have already demonstrated in the present manuscript that the GAGs content is the same in MSOD-B and MSOD-B edited ECMs, we will provide additional data demonstrating the maintenance of BMP-2 content in all generated cartilage tissues.

      -There is a great deal of missing detail in the manuscript.

      We will provide additional information on the MSOD-B line and the overall methodology in our revised version.

      -The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing.

      We will provide additional information and pictures related to our in vivo studies. We believe our group size supports our conclusions confirmed by statistical assessment.

      -Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least.

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated lyophilized cartilage grafts which will be more evidently supported by histological assessment before lyophilization.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chen and colleagues first compared the cartilage tissues collected from OA and HA patients using histology and immunostaining. Then, a genome-wide DNA methylation analysis was performed, which informed the changes of a novel gene, TNXB. IHC confirmed that TNXB has a lower expression level in HA cartilage than OA. Next, the authors demonstrated that TNXB levels were reduced in the HA animal model, and intraarticular injection of AAV carrying TNXB siRNA induced cartilage degradation and promoted chondrocyte apoptosis. Based on KEGG enrichment, histopathological analysis, and western blot, the authors also showed the relationship between TNXB and AKT phosphorylation. Lastly, AKT agonist, specifically SC79 in this study, was shown to partially rescue the changes of in vitro-cultured chondrocytes induced by Tnxb knock-down. Overall, this is an interesting study and provided sufficient data to support their conclusion.

      Strengths:

      (1) Both human and mouse samples were examined.

      (2) The HA model was used.

      (3) Genome-wide DNA methylation analysis was performed.

      Weaknesses:

      (1) In some experiments, the selection of the control groups was not ideal.

      Thank you for comments. The reviewer raised the concerns about using human OA cartilage as control, instead of health cartilage. This is an important detail we didn’t describe in the previous version. We have added our explanation in revised Methods.

      (2) More details on analyzing methods and information on replicates need to be included.

      We greatly appreciate your careful review and helpful suggestions. We have added detailed information to our revised draft.

      (3) Discussion can be improved by comparing findings to other relevant studies.

      Thank the reviewer very much for the opportunity to improve our manuscript. We have improved discussions as reviewer suggested in Recommendation 13.

      (4) The use of transgenic mice with conditional Tnxb depletion can further define the physiological roles of Tnxb.

      Thanks for this valuable comment. We understand that conditional Tnxb-KO mice is much helpful for the study of biological roles of Tnxb, and it will be constructed and used in our future studies.

      Recommendations For the Authors:

      (1) Please add more information about HA such as incidence to highlight the importance of the study.

      We greatly appreciate your careful review and helpful suggestions. We have provided more information about the importance of HA study in revised Introduction. Please see lines 90-93 and 103-112.

      (2) Please justify the use of OA cartilage, instead of normal tissues, as the control.

      Thanks for your suggestion. We certainly would have liked to use healthy cartilage as control, but we were extremely difficult to obtain enough control samples from healthy individuals. Despite the mechanistic and phenotypic differences between HA and OA, OA is often used as “disease” control to reveal the characteristics in HA 1,2. Thus, we measured cartilage degeneration and DNA methylation difference in HA and OA patients. We have provided the statement and evidence in revised manuscript. Please see lines 144-145.

      (3) Please provide details of how to calculate the Cartilage wear area ratio in Figure 1D, and measure the positive staining area in Figure 1F.

      We apologize for the issue you pointed out. Here, we provide detailed information for how positively stained areas are calculated. Specifically, in Figure 1D, we obtained the cartilage area ratio by calculating the ratio of blue cartilage staining area to the whole tissue area by using image J software. In Figure 1F, the area of positive staining was determined upon secondary antibody treatment and color development using DAB chromogen (brown stain). We then obtained the positive staining area ratio by calculating the ratio of positive staining area to the whole cartilage area by using image J software.

      (4) Please label the location of hemorrhagic ferruginous deposits in Figure 1.

      Thank you for your valuable suggestion. We have used black arrows to indicate hemorrhagic ferruginous deposits in revised Figure 1A.

      (5) Please define the meaning of "n" in all figure legends, such as technical or biological replicates.

      Thanks for your suggestion. We have defined the meaning of "n" in all figure legends in revised manuscript.

      (6) In Figure 3, please increase the font size of B, D, F, H, and J. The same applies to other figures.

      Thank you for your valuable suggestion. We have increased the font size of figures in our revised manuscript.

      (7) Line 327, "(Figure 1, F and G)" should be Figure 2F, G.

      Thanks for your reminding. We have corrected it in the revision. Please see lines 347.

      (8) Reduced TNXB levels in human HA cartilage are one of the major findings in this study. Currently, only semi-quatative IHC was used to draw the conclusion. A second method, such as real-time PCR or western blot, is required.

      Thanks for your suggestion. We feel very sorry that we did not have enough samples of human HA cartilages for qPCR and WB experiments, due to severe erosion of the HA cartilage. We have pointed out this limitation in revised drafts. Please see lines 445-448.

      (9) Figure 3 shows that reduced Tnxb was accompanied by the increased Dnmt1. In addition, this study is about methylation. Have the authors tested the change of Dnmt1 levels when Tnxb was knocked down?

      Thanks for your suggestion. According to the reviewer's suggestion, we have tested the expression of Dnmt1 in Tnxb-KD chondrocytes, and no significant alteration was observed. Please see the following Figure.

      Author response image 1.

      Figure Legend: Representative IHC staining of Dnmt1 in articular cartilage from Tnxb-KD HA mice. Corresponding quantification of the proportion of Dnmt1 positive regions. Red arrows indicate positive cells. Scale bar: 100 μm. Data were presented as means ± SD; n = 5 in each group. ns = no significance by unpaired Student’s t test.

      (10) Also, is there a causal relationship between Tnxb levels and the distribution of methylation levels? Any related study was performed?

      Following the valuable suggestion of the reviewer, we used two well-known DNA methyltransferase inhibitors (RG108 or 5-Aza-dc) 3 to examine whether DNA methylation regulates transcriptional expression of TNXB. We found that both inhibitors significantly up-regulated Tnxb mRNA level. We have added this result to the revised Supplementary Figure 4 and draft (lines 292-296 and 369-374).

      (11) In Figure 6, what was the control of "AKT agnost" group?

      Thank you for your suggestion. We feel sorry for our negligence and we have added the vehicle group as a control for AKT agonists in Figure 6 in our revised manuscript.

      (12) Previous studies have reported the involvement of TNXB in TGF-β signaling. Have the authors examined the effect of TNXB on TGF-β signaling in chondrocytes?

      Thank you for your suggestion. Here, we examined the expression of TGF-β signaling in Tnxb-KD chondrocyte and no significant changes were observed. We have discussed this result in revised draft (lines 475-479). We have added this result to the revised Supplementary Figure 7.

      (13) Discussion can be improved. For example, have previous studies reported the association between TNXB and methylation in other cells/tissues? In addition to apoptosis, are there other potential mechanisms underlying the protective role of TNXB in chondrocytes?

      Thank you for your valuable comments. Previous studies have shown the different DNA methylation of TNXB in whole blood from rheumatoid arthritis patients and in retinal pigment epithelium from patients with age-related macular degeneration 4,5. Herein, we were the first to report the association between DNA methylation of TNXB and HA cartilage degeneration. As for TNXB, there are limited public studies regarding physiological function of TNXB, among which mostly report the effect of TNXB on extracellular matrix organization 6,7. In our work, we found that TNXB regulated the phosphorylation of AKT. Since previous reports showed AKT controlled the expression of Mmp13 8, we thought that TNXB might regulated the chondrocyte extracellular matrix organization, in addition to its function on apoptosis. We have discussed these in revised manuscript (lines 462-464, and 495-501).

      (14) The manuscript writing needs to be improved. Typos and grammar issues were noted.

      Thanks. We have modified and polished our language and we hope the revised version could be acceptable for you.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript mainly studied the biological effect of tenascin XB (TNXB) on hemophilic arthropathy (HA) progression. Using bioinformatic and histopathological approaches, the authors identified the novel candidate gene TNXB for HA. Next, the authors showed that TNXB knockdown leads to chondrocyte apoptosis, matrix degeneration, and subchondral bone loss in vivo/vitro. Furthermore, AKT agonists promoted extracellular matrix synthesis and prevented apoptosis in TNXB knockdown chondrocytes.

      Strengths:

      In general, this study significantly advances our understanding of HA pathogenesis. The authors utilize comprehensive experimental strategies to demonstrate the role of TNXB in cartilage degeneration associated with HA. The results are clearly presented, and the conclusions appear appropriate.

      Weaknesses:

      Additional clarification is required regarding the gender of the F8-/- mouse in the study. Is the mouse male or female?

      We feel sorry that we did not provide enough information about the gender of the F8-/- mouse in the previous draft. Here, we used male F8-/- mice as the study subjects for our experiments. Hemophilia A is predominantly seen in males because of the X chromosome linkage 9.

      Recommendations For The Authors:

      Some issues need to be addressed in the manuscript:

      (1) During the progression of HA, in addition to cartilage degeneration, synovial hypertrophy and inflammation are also significant symptoms. How is the expression of TNXB in HA synovium?

      Thank you for your valuable comments. According to the reviewer's suggestion, we tested the expression of TNXB in the synovium, and there was no statistically significant difference in the expression level of TNXB in the synovium (Supplementary Figure. 2) Please see lines 347-349.

      (2) Lines 183-188. The methods of virus infection should be more detailed. What was the concentration of the AAVs injected? And how many doses were administrated?

      Thank you for your suggestion. We have added an explanation of virus infection and injected doses in revised methods section (lines 205-206).

      (3) Line 197-198. Could the author double-check the decalcification time for human cartilage samples? Is it for 3 months? Or for 3 weeks?

      Thank you for your suggestion. We have reconfirmed the decalcification of human cartilage samples for 3 months.

      (4) Line 343-344 "Above results suggest that TNXB might be protective against HA and its cartilage suppression is closely related to HA development." The conclusion is inappropriate, please revise it.

      Thanks for your suggestion. We have revised this conclusion into “Above results suggest that the suppression of TNXB in cartilage promotes the HA development”. Please see lines 365-366.

      (5) Line 326-327, the IHC staining for human samples is shown in Figure 2, not Figure 1. Please double check and revise it.

      Thanks for your reminding. We feel sorry for our negligence and we have corrected it in the revision.

      (6) For Figure 1B, it shows the MRI images of knee joints. However, the method section lacks details regarding the MRI imaging scan and analysis. Could the author include this information in the method section?

      Thank you for your valuable comments. We have added the method of MRI imaging scan and analysis in revised Methods. Please see lines 154-163.

      (7) In Figure 5, The statistical result of Bcl-2 is inconsistent with its Western blot band. Please check.

      Thanks for your reminding. We have modified it in the revision.

      (8) Please read through the text carefully to check for language problems. For example, in Line 68 "Our" not "our".

      Thanks for your reminding. In revision, we have corrected it. Please see Line 68.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Dr. Chen et al. investigates the genes that are differentially methylated and associated with cartilage degeneration in hemophilia patients. The study demonstrates the functional mechanisms of the TNXB gene in chondrocytes and F8-/- mice. The authors first showed significant DNA methylation differences between hemophilic arthritis (HA) and osteoarthritis through genome-wide DNA methylation analysis. Subsequently, they showed a decreased expression of the differentially methylated TNXB gene in cartilage from HA patients and mice. By knocking down TNXB in vivo and in vitro, the results indicated that TNXB regulates extracellular matrix homeostasis and apoptosis by modulating p-AKT. The findings are novel and interesting, and the study presents valuable information in blood-induced arthritis research.

      Strengths:

      The authors adopted a comprehensive approach by combining genome-wide DNA methylation analysis, in vivo and in vitro experiments using human and mouse samples to illustrate the molecular mechanisms involved in HA progression, which is crucial for developing targeted therapeutic strategies. The study identifies Tenascin XB (TNXB) as a central mediator in cartilage matrix degradation. It provides mechanistic insights into how TNXB influences cartilage matrix degradation by regulating the activation of AKT. It opens avenues for future research and potential therapeutic interventions using AKT agonists for cartilage protection in hemophilic arthropathy. The conclusions drawn from the study are clear and directly tied to the findings.

      Weaknesses:

      (1) The study utilizes a small sample size (N=5 for both osteoarthritis and hemophilic arthropathy). A larger sample size would enhance the generalizability and statistical power of the findings.

      Thank you for pointing out this deficiency. Indeed, our sample size is relatively small, although the overall sample size was sufficient for statistical analyses. And we have added this limitation in discussion in revised manuscript. Please see line 445-448. Considering the small sample size, we subsequently performed functional validation study for TNXB, one of the most significant genes, and demonstrated that TNXB exerted critical impacts on chondrocytes apoptosis in HA pathogenesis in vivo and in vitro.

      (2) The use of an animal model (F8-/- mouse) to investigate the role of TNXB may not fully capture the complexity of human hemophilic arthropathy. Differences in the biology between species may affect the translatability of the findings to human patients.

      Thank you for your valuable comments. We recognize that biological differences between species can affect the clinical translation of research findings. In our work, we sequenced human cartilage samples to obtain the differentially methylated gene-TNXB. Meanwhile, we demonstrated that protein expression of TNXB protein was significantly down-regulated in HA human cartilage and F8-/- transgenic mouse cartilage. The F8-/- transgenic mouse serves as a well-accepted model for the study of hemophilia, which is phenotypically similar to that of human patients suffering from the disease and spontaneously bleeds into the joints and soft tissues. Besides, this model mouse has been widely used in the study of hemophilia and hemophilic arthritis 9-11.

      (3) The study primarily focuses on TNXB as a central mediator, but it might overlook other potentially relevant factors contributing to cartilage degradation in hemophilic arthropathy. A more holistic exploration of genetic and molecular factors could provide a broader understanding of the condition.

      Thanks for your suggestion. Since our human sample size is relatively small, we should interpret differentially methylated genes cautiously. Therefore, we mainly focused on the most top significant gene TNXB for functional study. In our further study, we will expand the sample size to more comprehensively explore the molecular mechanisms of HA.

      Recommendations For The Authors:

      The following are my suggestions:

      (1) Why do the authors choose to concentrate on the knee joint in the introduction when hemophilia, characterized by a deficiency in clotting factor F8, is recognized as a systemic disease?

      Thank you for your valuable comments. Although hemophilia a systemic disease, approximately 80%-90% of bleeding episodes in patients with hemophilia occur within the musculoskeletal system, especially in the knee joint 12.

      (2) While Figure 1 illustrates distinct expressions of Dnmt1 and Dnmt3a, only Dnmt1 results are presented in HA mice models in Figure 3. To address this, it is suggested that the expression of Dnmt3a be explored in animal models.

      Thank you for your suggestion. According to the reviewer's suggestion, we examined the expression of Dnmt3a in mouse articular cartilage, and the expression level of Dnmt3a was significantly up-regulated in both the 4W and 8W model groups compared with the control group (Figure 3). Please see line 364.

      (3) In Figure 3, the sample size for Dnmt1 is smaller than the other indicators; therefore, supplementing the sample count is recommended.

      Thanks for your reminding. We have corrected it in the revision.

      (4) Regarding Figure 4G, a few apoptotic cells were observed in the AAV NC group. It is advised that this figure be reviewed for accuracy.

      Thanks for your suggestion. In Figure 5D, the AAV-NC group is the case of needle-injected with AAV. Therefore, it is normal for apoptotic cells to appear in the cartilage layer.

      (5) The authors concluded that TNXB plays a role in apoptosis and AKT signaling. Providing expression data for Caspase9 would be valuable to strengthen this assertion, as PI3K/AKT signaling directly influences its activation during apoptosis.

      Thank you for your comments. We have examined the expression of Cleaved-Caspase9 protein, and found that knockdown of TNXB resulted in upregulation of Cleaved-Caspase9 protein expression, which was reversed by addition of SC79. This result has added in revised Figure 6 and manuscript. Please see line 414.

      (6) Quantitative analysis of the differences between the two groups in Supplemental Figures is necessary.

      Thank you for your suggestion. We have added the quantitative analysis of the differences between the two groups in Supplemental Figures.

      (7) With three major isoforms (homologs) of AKT in mammals-AKT1, 2, and 3 - why did the authors specifically focus on AKT1?

      Thank you for your comments. Based on the results of the KEGG enrichment analysis of differential methylated genes, we investigated the role of PI3K/AKT pathway in apoptosis of HA chondrocytes. AKT is universally acknowledged as a core factor in the PI3K/AKT pathway that plays critical roles in various cellular activities such as cell proliferation, cell differentiation, cell apoptosis, metabolism and so on 13,14, More notably, several studies demonstrated that in AKT family, Akt1 primarily was involved in regulation of chondrocyte survival and proteoglycan synthesis 15. Therefore, we detected phosphorylation of AKT1 in HA cartilages and TNXB-KD chondrocytes, and found that TNXB regulation chondrocytes ECM and apoptosis by AKT1. Reference:

      (1) Cooke, E.J., Zhou, J.Y., Wyseure, T., Joshi, S., Bhat, V., Durden, D.L., Mosnier, L.O., and von Drygalski, A. (2018). Vascular Permeability and Remodelling Coincide with Inflammatory and Reparative Processes after Joint Bleeding in Factor VIII-Deficient Mice. Thromb Haemost 118, 1036-1047. 10.1055/s-0038-1641755.

      (2) Kleiboer, B., Layer, M.A., Cafuir, L.A., Cuker, A., Escobar, M., Eyster, M.E., Kraut, E., Leavitt, A.D., Lentz, S.R., Quon, D., et al. (2022). Postoperative bleeding complications in patients with hemophilia undergoing major orthopedic surgery: A prospective multicenter observational study. J Thromb Haemost 20, 857-865. 10.1111/jth.15654.

      (3) Weiland, T., Weiller, M., Kunstle, G., and Wendel, A. (2009). Sensitization by 5-azacytidine toward death receptor-induced hepatic apoptosis. J Pharmacol Exp Ther 328, 107-115. 10.1124/jpet.108.143560.

      (4) Anaparti, V., Agarwal, P., Smolik, I., Mookherjee, N., and El-Gabalawy, H. (2020). Whole Blood Targeted Bisulfite Sequencing and Differential Methylation in the C6ORF10 Gene of Patients with Rheumatoid Arthritis. J Rheumatol 47, 1614-1623. 10.3899/jrheum.190376.

      (5) Porter, L.F., Saptarshi, N., Fang, Y., Rathi, S., den Hollander, A.I., de Jong, E.K., Clark, S.J., Bishop, P.N., Olsen, T.W., Liloglou, T., et al. (2019). Whole-genome methylation profiling of the retinal pigment epithelium of individuals with age-related macular degeneration reveals differential methylation of the SKI, GTF2H4, and TNXB genes. Clin Epigenetics 11, 6. 10.1186/s13148-019-0608-2.

      (6) Mao, J.R., Taylor, G., Dean, W.B., Wagner, D.R., Afzal, V., Lotz, J.C., Rubin, E.M., and Bristow, J. (2002). Tenascin-X deficiency mimics Ehlers-Danlos syndrome in mice through alteration of collagen deposition. Nat Genet 30, 421-425. 10.1038/ng850.

      (7) Zhang, K., Wang, X., Zeng, L.T., Yang, X., Cheng, X.F., Tian, H.J., Chen, C., Sun, X.J., Zhao, C.Q., Ma, H., and Zhao, J. (2023). Circular RNA PDK1 targets miR-4731-5p to enhance TNXB expression in ligamentum flavum hypertrophy. FASEB J 37, e22877. 10.1096/fj.202200022RR.

      (8) Guo, H., Yin, W., Zou, Z., Zhang, C., Sun, M., Min, L., Yang, L., and Kong, L. (2021). Quercitrin alleviates cartilage extracellular matrix degradation and delays ACLT rat osteoarthritis development: An in vivo and in vitro study. J Adv Res 28, 255-267. 10.1016/j.jare.2020.06.020.

      (9) Weitzmann, M.N., Roser-Page, S., Vikulina, T., Weiss, D., Hao, L., Baldwin, W.H., Yu, K., Del Mazo Arbona, N., McGee-Lawrence, M.E., Meeks, S.L., and Kempton, C.L. (2019). Reduced bone formation in males and increased bone resorption in females drive bone loss in hemophilia A mice. Blood Adv 3, 288-300. 10.1182/bloodadvances.2018027557.

      (10) Haxaire, C., Hakobyan, N., Pannellini, T., Carballo, C., McIlwain, D., Mak, T.W., Rodeo, S., Acharya, S., Li, D., Szymonifka, J., et al. (2018). Blood-induced bone loss in murine hemophilic arthropathy is prevented by blocking the iRhom2/ADAM17/TNF-alpha pathway. Blood 132, 1064-1074. 10.1182/blood-2017-12-820571.

      (11) Vols, K.K., Kjelgaard-Hansen, M., Ley, C.D., Hansen, A.K., and Petersen, M. (2019). Bleed volume of experimental knee haemarthrosis correlates with the subsequent degree of haemophilic arthropathy. Haemophilia 25, 324-333. 10.1111/hae.13672.

      (12) Lobet, S., Peerlinck, K., Hermans, C., Van Damme, A., Staes, F., and Deschamps, K. (2020). Acquired multi-segment foot kinematics in haemophilic children, adolescents and young adults with or without haemophilic ankle arthropathy. Haemophilia 26, 701-710. 10.1111/hae.14076.

      (13) Garcia, D., and Shaw, R.J. (2017). AMPK: Mechanisms of Cellular Energy Sensing and Restoration of Metabolic Balance. Mol Cell 66, 789-800. 10.1016/j.molcel.2017.05.032.

      (14) Johnson, J., Chow, Z., Lee, E., Weiss, H.L., Evers, B.M., and Rychahou, P. (2021). Role of AMPK and Akt in triple negative breast cancer lung colonization. Neoplasia 23, 429-438. 10.1016/j.neo.2021.03.005.

      (15) Rao, Z., Wang, S., and Wang, J. (2017). Peroxiredoxin 4 inhibits IL-1beta-induced chondrocyte apoptosis via PI3K/AKT signaling. Biomed Pharmacother 90, 414-420. 10.1016/j.biopha.2017.03.075.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their thorough review of and overall positive comments on our manuscript. We have revised the manuscript to address the one remaining concern raised by one of the reviewers. This is described below.

      Fig.1B-C: To give a standard deviation from 2 data points has no statistical significance. In this case it would be better to define as range/difference of the 2 data points.

      We have modified the legend for Figure 1 to now read, “The average of two experiments is plotted with the bars representing the range of each time point.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations to Authors:

      Reviewer #1 (Recommendations to authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations to authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study the authors use an elegant set of single-molecule experiments to assess the transcriptional and post-transcriptional regulation of RecB. The question stems from a previous observation from the same lab, that RecB protein levels are low and not induced under DNA damage. The authors first show that recB transcript levels are low and have a short half-life. They further show that RecB levels are likely regulated via translational control. They provide evidence for low noise in RecB protein levels across cells and show that the translation of the mRNA increases under double-strand break conditions. Authors identify Hfq binding sites in the recbcd [recBCD] operon and show that Hfq regulates the levels of RecB protein without changing the mRNA levels. They suggest that RecB translation is directly controlled by Hfq binding to mRNA, as mutating one of the binding sites has a direct effect on RecB protein levels.

      Strengths:

      The implication of Hfq in regulation of RecB translation is important and suggests mechanisms of cellular response to DNA damage that are beyond the canonically studied mechanisms (such as transcriptional regulation by LexA). Data are clearly presented and the writing is direct and easy to follow. Overall, the study is well-designed and provides novel insights into the regulation of RecB, that is part of the complex required to process break ends.

      Weaknesses:

      Some key findings need additional support/ clarifications to strengthen the conclusions. These are suggested to the authors.

      Reviewer #2 (Public Review):

      Summary:

      The authors carry out a careful and rigorous quantitative analysis of RecB transcript and protein levels at baseline and in response to DNA damage. Using single-molecule FISH and Halo-tagging in order to achieve sensitive measurements, they provide evidence that enhanced RecB protein levels in response to DNA damage are achieved through a post-transcriptional mechanism mediated by the La-like RNA binding protein, Hhq1 [Sm-like RNA binding protein, Hfq]. In terms of biological relevance, the authors suggest that this mechanism provides a way to control the optimum level of RecB expression as both deletion and over-expression are deleterious. In addition, the proposed mechanism provides a new framework for understanding how transcriptional noise can be suppressed at the protein level.

      Strengths:

      Strengths of the manuscript include the rigorous approaches and orthogonal evidence to support the core conclusions, for example, the evidence that altering either Hhq1 [Hfq] or its recognition sequence on the RNA similarly enhance the protein to RNA ratio of RecB. The writing is clear and the experiments are well-controlled. The modeling approaches provide essential context to interpret the data, particularly given the small numbers of molecules per cell. The interpretations are careful and well supported.

      Weaknesses:

      The authors make a compelling case for the biological need to exquisitely control RecB levels, which they suggest is achieved by the pathway they have uncovered and described in this work. However, this conclusion is largely inferred as the authors only investigate the effect on cell survival in response to (high levels of) DNA damage and in response to two perturbations - genetic knock-out or over-expression, both of which are likely more dramatic than the range of expression levels observed in unstimulated and DNA damage conditions.

      In the discussion, we proposed that the post-transcriptional regulation of recB that we have uncovered could be involved in keeping RecB levels within an optimal range. We agree that testing the phenotypic impact of small changes in RecB levels would add additional strength to this suggestion. However, this is experimentally very challenging because of the low copy number of RecB molecules, which makes it difficult to slightly alter RecB levels in a controlled and homogeneous (across cells) manner. Developing the synthetic biology tools necessary for such an experiment is beyond the scope of this article. In the manuscript, we will clarify the limits of our interpretation of the role of the uncovered regulation.

      Reviewer #3 (Public Review):

      Summary:

      The work by Kalita et al. reports regulation of RecB expression by Hfq protein in E.coli cell. RecBCD is an essential complex for DNA repair and chromosome maintenance. The expression level needs to be regulated at low level under regular growth conditions but upregulated upon DNA damage. Through quantitative imaging, the authors demonstrate that recB mRNAs and proteins are expressed at low level under regular conditions. While the mRNA copy number demonstrates high noise level due to stochastic gene expression, the protein level is maintained at a lower noise level compared to expected value. Upon DNA damage, the authors claim that the recB mRNA level is not significantly affected, but RecB protein level increases due to a higher translation efficiency. [Upon DNA damage, the authors claim that the recB mRNA concentration is decreased, however RecB protein level is compensated by higher translation efficiency]. Through analyzing CLASH data on Hfq, they identified two Hfq binding sites on RecB polycistronic mRNA, one of which is localized at the ribosome binding site (RBS). Through measuring RecB mRNA and protein level in the ∆hfq cell, the authors conclude that binding of Hfq to the RBS region of recB mRNA suppresses translation of recB mRNA. This conclusion is further supported by the same measurement in the presence of Hfq sequestrator, the sRNA ChiX, and the deletion of the Hfq binding region on the mRNA.

      Strengths:

      (1) The manuscript is well-written and easy to understand.

      (2) While there are reported cases of Hfq regulating translation of bound mRNAs, its effect on reducing translation noise is relatively new.

      (3) The imaging and analysis are carefully performed with necessary controls.

      Weaknesses:

      The major weaknesses include a lack of mechanistic depth, and part of the conclusions are not fully supported by the data.

      (1) Mechanistically, it is still unclear why upon DNA damage, translation level of recB mRNA increases, which makes the story less complete. The authors mention in the Discussion that a moderate (30%) decrease in Hfq protein was observed in previous study, which may explain the loss of translation repression on recB. However, given that this mRNA exists in very low copy number (a few per cell) and that Hfq copy number is on the order of a few hundred to a few thousand, it's unclear how 30% decrease in the protein level should resides a significant change in its regulation of recB mRNA.

      While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, we reason that upon DNA damage, a moderate decrease in the Hfq protein abundance (30%) can lead to a similar competition among Hfq targets where high-affinity targets outcompete low- affinity ones as well as low-abundant ones (such as recB mRNAs). Therefore, we hypothesise that the regulation of low abundant targets of Hfq by moderate perturbations of Hfq protein level is a potential explanation for the change in RecB translation that we have observed. We will expand this part of the discussion to explain our reasoning in a more explicit and coherent way.

      (2) Based on the experiment and the model, Hfq regulates translation of recB gene through binding to the RBS of the upstream ptrA gene through translation coupling. In this case, one would expect that the behavior of ptrA gene expression and its response to Hfq regulation would be quite similar to recB. Performing the same measurement on ptrA gene expression in the presence and absence of Hfq would strengthen the conclusion and model

      Indeed, based on our model, we expect PtrA expression to be regulated by Hfq in a similar manner to RecB. However, the product encoded by the ptrA gene, Protease III, (i) has been poorly characterised; (ii) unlike RecB, is located in the periplasm (DOI: 10.1128/jb.149.3.1027-1033.1982); and (iii) is not involved in any DNA repair pathway. Therefore, analysing PtrA expression would take us away from the key questions of our study.

      (3) The authors agree that they cannot exclude the possibility of sRNA being involved in the translation regulation. However, this can be tested by performing the imaging experiments in the presence of Hfq proximal face mutations, which largely disrupt binding of sRNAs.

      (4) The data on construct with a long region of Hfq binding site on recB mRNA deleted is less convincing. There is no control to show that removing this sequence region itself has no effect on translation, and the effect is solely due to the lack of Hfq binding. A better experiment would be using a Hfq distal face mutant that is deficient in binding to the ARN motifs.

      We thank the referee for these suggestions. We have performed the requested experiments, and the quantification of RecB abundance in the presence of Hfq proteins mutated in the proximal and distal face will be added to the revised version of the manuscript.

      (5) Ln 249-251: The authors claim that the stability of recB mRNA is not changed in ∆hfq simply based on the steady-state mRNA level. To claim so, the lifetime needs to be measured in the absence of Hfq.

      We agree that this statement is not fully supported by our data and will address this issue in the revised version.

      (6) What's the labeling efficiency of Halo-tag? If not 100% labeled, is it considered in the protein number quantification? Is the protein copy number quantification through imaging calibrated by an independent method? Does Halo tag affect the protein translation or degradation?

      Our previous study (DOI: 10.1038/s41598-019-44278-0) described a detailed characterisation of the HaloTag labelling technique for quantifying low-copy proteins in single E. coli cells.

      In that study, we used RecB-HaloTag as an example of a low-copy number protein. We showed a complete quantitative agreement of RecB detection between two fully independent methods: HaloTag-based labelling with cell fixation and RecB-sfGFP combined with a microfluidic device that lowers protein diffusion in the bacterial cytoplasm. This second method has previously been validated for protein quantification (DOI: 10.1038/ncomms11641) and provides detection of 80-90% of the labelled protein. Additionally, in our protocol, immediate chemical fixation of cells after the labelling and quick washing steps ensure that new, unlabelled RecB proteins are not produced. We, therefore, conclude that our approach to RecB detection is highly reliable and sufficient for comparing RecB production in different conditions and mutants.

      The RecB-HaloTag construct has been designed for minimal impact on RecB production and function. The HaloTag is translationally fused to RecB in a loop positioned after the serine present at position 47 where it is unlikely to interfere with (i) the formation of RecBCD complex (based on RecBCD structure, DOI: 10.1038/nature02988), (ii) the initiation of translation (as it is far away from the 5’UTR and the beginning of the open reading frame) and (iii) conventional C-terminal-associated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). In our manuscript, we showed that the RecB-HaloTag degradation rate is similar to the dilution rate due to bacterial growth. This is in line with a recent study on unlabelled proteins, which shows that RecB’s lifetime is set by the cellular growth rate (https://doi.org/10.1101/2022.08.01.502339) and indicates that the HaloTag fusion is not affecting RecB stability.

      Furthermore, we have demonstrated (DOI: 10.1038/s41598-019-44278-0) that (i) bacterial growth is not affected by replacing the native RecB with RecB-HaloTag, (ii) RecB-HaloTag is fully functional upon DNA damage, and (iii) no proteolytic processing of the RecB-HaloTag is detected by Western blot.

      These results suggest that RecB expression and functionality are unlikely to be affected by the translational HaloTag insertion at Ser-47 in RecB. In the revised version of the manuscript, we will add information about the construct and discuss the reliability of the quantification.

      (7) Upper panel of Fig S8a is redundant as in Fig 5B. Seems that Fig S8d is not described in the text.

      Indeed, the data in the upper panel in Fig S8a was repeated (from Fig 5B) for visual purposes to facilitate comparison with the panel below. We will modify the figure legend to indicate this repetition clearly.

      In Fig S8d, we confirmed the functionality of the Hfq protein expressed from the pQE-Hfq plasmid in our experimental conditions, which was not described in the text. We will include this clarification in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We would like to first thank the Editor as well as the three reviewers for their enthusiasm and conducting another careful evaluation of our manuscript. We appreciate their thoughtful and constructive comments and suggestions. Some concerns regarding experimental design, data analysis, and over-interpretation of our findings still remains unresolved after the initial revision. Here we endeavored to address these remaining concerns through further refinement of our writing, and inclusion of these concerns in the discussion session. We hope our response can better explain the rationale of our experimental design and data interpretation. In addition, we also acknowledge the limitations of our present study, so that it will benefit future investigations into this topic. Our detail responses are provided below.

      Reviewer #1 (Public Review):

      This study examines whether the human brain uses a hexagonal grid-like representation to navigate in a non-spatial space constructed by competence and trustworthiness. To test this, the authors asked human participants to learn the levels of competence and trustworthiness for six faces by associating them with specific lengths of bar graphs that indicate their levels in each trait. After learning, participants were asked to extrapolate the location from the partially observed morphing bar graphs. Using fMRI, the authors identified brain areas where activity is modulated by the angles of morphing trajectories in six-fold symmetry. The strength of this paper lies in the question it attempts to address. Specifically, the question of whether and how the human brain uses grid-like representations not only for spatial navigation but also for navigating abstract concepts, such as social space, and guiding everyday decision-making. This question is of emerging importance.

      I acknowledge the authors' efforts to address the comments received. However, my concerns persist:

      Thanks very much again for the re-evaluation and comments. Please find our revision plans to each comment below.

      (1) The authors contend that shorter reaction times correlated with increased distances between individuals in social space imply that participants construct and utilize two-dimensional representations. This method is adapted from a previous study by Park et al. Yet, there is a fundamental distinction between the two studies. In the prior work, participants learned relationships between adjacent individuals, receiving feedback on their decisions, akin to learning spatial locations during navigation. This setup leads to two different predictions: If participants rely on memory to infer relationships, recalling more pairs would be necessary for distant individuals than for closer ones. Conversely, if participants can directly gauge distances using a cognitive map, they would estimate distances between far individuals as quickly as for closer ones. Consequently, as the authors suggest, reaction times ought to decrease with increasing decision value, which, in this context, corresponds to distances. However, the current study allowed participants to compare all possible pairs without restricting learning experiences, rendering the application of the same methodology for testing two-dimensional representations inappropriate. In this study, the results could be interpreted as participants not forming and utilizing two-dimensional representations.

      We apologize for not being clear enough about our task design, we have made relevant changes in the methodology section in the manuscript to make it clearer. The reviewer’s concern is that participants learned about all the pairs in the comparison task which makes the distance effect invalid. We would like to clarify that during all the memory test tasks (the comparison task, the collect task and the recall task outside and inside scanner), participants never received feedback on whether their responses were correct or not. Therefore, the comparison task in our study is similar to the previous study by Park et al. (2021). Participants do not have access to correct responses for all possible pairs of comparison prior to or during this task, they would need to make inference based on memory retrieval.

      (2) The confounding of visual features with the value of social decision-making complicates the interpretation of this study's results. It remains unclear whether the observed grid-like effects are due to visual features or are genuinely indicative of value-based decision-making, as argued by the authors. Contrary to the authors' argument, this issue was not present in the previous study (Constantinescu et al.). In that study, participants associated specific stimuli with the identities of hidden items, but these stimuli were not linked to decision-making values (i.e., no image was considered superior to another). The current study's paradigm is more akin to that of Bao et al., which the authors mention in the context of RSA analysis. Indeed, Bao et al. controlled the length of the bars specifically to address the problem highlighted here. Regrettably, in the current paradigm, this conflation remains inseparable.

      We’d like to thank the reviewer for facilitating the discussion on the question of ‘social space’ vs. ‘sensory space’. The task in scanner did not require value-based decision making. It is akin to both the Bao et al. (2019) study and Constantinescu et al. (2016) study in a sense that all three tasks are trying to ask participants to imagine moving along a trajectory in an abstract, non-physical space and the trajectory is grounded in sensory cue. Participants were trained to associate the sensory cue with abstract (social/nonsocial) concepts. We think that the paradigm is a relatively faithful replication of the study by Constantinescu et al. Nonetheless, we agreed that a design similar to Bao et al. (2019) which controls for sensory confounds would be more ideal to address this concern, or adopting a value-based decision-making task in the scanner similar to that by Park et al. (2021), and we have included this limitation in the discussion section.

      (3) While the authors have responded to comments in the public review, my concerns noted in the Recommendation section remain unaddressed. As indicated in my recommendations, there are aspects of the authors' methodology and results that I find difficult to comprehend. Resolving these issues is imperative to facilitate an appropriate review in subsequent stages.

      Considering that the issues raised in the previous comments remain unresolved, I have retained my earlier comments below for review.

      We apologize for not addressing the recommendations properly, please find detailed our response and plans for revision.

      I have some comments. I hope that these can help.

      (1) While the explanation of Fig.4A-C is lacking in both the main text and figure legend, I am not sure if I understand this finding correctly. Did the authors find the effects of hexagonal modulation in the medial temporal gyrus and lingual gyrus correlate with the individual differences in the extent to which their reaction times were associated with the distances between faces when choosing a better collaborator? If so, I am not sure what argument the authors try to draw from these findings. Do the authors argue that these brain areas show hexagonal modulation, which was not supported in the previous analysis (Fig.3)? What is the level of correlation between these behavioral measures and the grid consistency effects in the vmPFC and EC, where the authors found actual grid-like activity? How do the authors interpret this finding? More importantly, how does this finding associate with other findings and the argument of the study?

      We apologize for not being clear enough in the manuscript and we will improve the clarity in our revision. This exploratory analysis reported in Figure 4 aims to use whole-brain analysis to examine: 1) if there is any correlation between the strength of grid-like representation of social value map and behavioral indicators of map-like representation; and 2) if there are any correlation between the strength of grid-like representation of this social value map and participants’ social trait.

      To be more specific, for the behavioral indicator, we used the distance effect in the reaction time of the comparison task outside the scanner. We interpreted stronger distance effect as a behavioral index of having better internal map-like representation. We interpreted stronger grid consistency effect as a neural index of better representation of the 2D social space. Therefore, we’d like to see if there exists correlation between behavioral and neural indices of map-like representation.

      To achieve this goal, behavioral indicators are entered as covariates in second-level analysis of the GLM testing grid consistency effect (GLM2). Figure3 showed results from GLM2 without the covariates. Figure4 showed results of clusters whose neural indices of map-like representation covaried with that from behavior and survived multiple-comparison correction. Indeed, in these regions, the grid consistency effect was not significant at group level (so not shown in Figure 3). We tried to interpret this finding in our discussion (line 374-289 for temporal lobe correlation, line 395-404 for precuneus correlation).

      Finally, we would like to point out that including the covariates in GLM2 did not change results in Figure3, the clusters in Figure3 still survives correction. Meanwhile, these clusters in Figure 3 did not show correlation with behavioral indicators of map-like representation.

      Author response image 1.

      (2) There are no behavioral results provided. How accurately did participants perform each of the tasks? How are the effects of grid consistency associated with the level of accuracy in the map test?

      Why did participants perform the recall task again outside the scanner?

      We will endeavor to improve signposting the corresponding figures in the main text. For the behavioral results, we reported the stats in section “Participants construct social value map after associative learning of avatars and corresponding characteristics” in the main text, and the plots are shown in Figure 1. Particularly, figure 1F showed accuracy of tasks in training, as well as the recall task in the scanner. For the correlation, we did not find significant correlation between behavioural accuracy and grid consistency effect. We will make it clearer in the result section.

      (3) The methods did not explain how the grid orientation was estimated and what the regressors were in GLM2. I don't think equations 2 and 3 are quite right.

      For the grid orientation estimation method, we provided detailed description in the Supplementary methods 2.2.2. We will add links to this section in the main text.

      Equation 2 and 3 describes how the parametric regressors entered into GLM2 were formed and provided prerequisites on calculation of grid orientations. Equation 2 was the results of directly applying the angle addition and subtraction theorems so they should be correct. We will try to make the rationale clearer in the supplementary text.

      (4) With the increase in navigation distances, more grid cells would activate. Therefore, in theory, the activity in the entorhinal cortex should increase with the Euclidean distances, which has not been found here. I wonder if there was enough variability in the Euclidean distances that can be captured by neural correlates. This would require including the distributions of Euclidean distances according to their trajectory angles. Regarding how Fig.1E is generated, I don't understand what this heat map indicates. Additionally, it needs to be confirmed if the grid effects remain while controlling for the Euclidean distances of navigation trajectories.

      We did not specifically control for the trajectory length, we only controlled for the distribution of trajectory to be uniform. We have included a figure of the distribution of Euclidean distances in Figure S9 and the distribution of trajectory direction in Figure S8.

      Author response image 2.

      As for Figure 1E, we aim to reproduce the findings from Figure 1F in Constantinescu et al. (2016) where they showed that participants progressively refined the locations of the outcomes through training. We divided the space into 15×15 subregions and computed the amount of time spent in each subregion and plotted Figure 1E. Brighter color in Figure 1E indicate greater amount of time spent in the corresponding subregion. Note that all these timing indices were computed as a percentage of the total time spent in the explore task in a given session. If participants were well-acquainted with the space and avatars, they would spend more time at the avatar (brighter color in avatar locations) in the review session compared to the learning session.

      As for the effect of distances on grid-like representation, we did not include the distance as a parametric modulator in grid consistency effect GLM (GLM2) due to insufficient trials in each bin (6-8 trials). But there is side evidence that could potentially rule out this confound. In the distance representation analysis, we did not find distance representation in any of the clusters that have significant grid-like representation (regions in Figure 2).

      Reviewer #2 (Public Review):

      Summary:

      In this work, Liang et al. investigate whether an abstract social space is neurally represented by a grid-like code. They trained participants to 'navigate' around a two-dimensional space of social agents characterized by the traits warmth and competence, then measured neural activity as participants imagined navigating through this space. The primary neural analysis consisted of three procedures: 1) identifying brain regions exhibiting the hexagonal modulation characteristic of a grid-like code, 2) estimating the orientation of each region's grid, and 3) testing whether the strength of the univariate neural signal increases when a participant is navigating in a direction aligned with the grid, compared to a direction that is misaligned with the grid. From these analyses, the authors find the clearest evidence of a grid-like code in the prefrontal cortex and weaker evidence in the entorhinal cortex.

      Strengths:

      The work demonstrates the existence of a grid-like neural code for a socially-relevant task, providing evidence that such coding schemes may be relevant for a variety of two-dimensional task spaces.

      Weaknesses:

      In the revised manuscript, the authors soften their claims about finding a grid code in the entorhinal cortex and provide additional caveats about limitations in their findings. It seems that the authors and reviewers are in agreement about the following weaknesses, which were part of my original review: Claims about a grid code in the entorhinal cortex are not well-supported by the analyses presented. The whole-brain analysis does not suggest that the entorhinal cortex exhibits hexagonal modulation; the strength of the entorhinal BOLD signal does not track the putative alignment of the grid code there; multivariate analyses do not reveal any evidence of a grid-like representational geometry.

      In the authors' response to reviews, they provide additional clarification about their exploratory analyses examining whether behavior (i.e., reaction times) and individual difference measures (i.e., social anxiety and avoidance) can be predicted by the hexagonal modulation strength in some region X, conditional on region X having a similar estimated grid alignment with some other region Y. My guess is that readers would find it useful if some of this language were included in the main text, especially with regard to an explanation regarding the rationale for these exploratory studies.

      Thank you very much again for your careful re-evaluation and suggestions. We have tried to improve our writing and incorporate the suggestions in the new revision.

      Reviewer #3 (Public Review):

      Liang and colleagues set out to test whether the human brain uses distance and grid-like codes in social knowledge using a design where participants had to navigate in a two-dimensional social space based on competence and warmth during an fMRI scan. They showed that participants were able to navigate the social space and found distance-based codes as well as grid-like codes in various brain regions, and the grid-like code correlated with behavior (reaction times).

      On the whole, the experiment is designed appropriately for testing for distant-based and grid-like codes, and is relatively well powered for this type of study, with a large amount of behavioral training per participant. They revealed that a number of brain regions correlated positively or negatively with distance in the social space, and found grid-like codes in the frontal polar cortex and posterior medial entorhinal cortex, the latter in line with prior findings on grid-like activity in entorhinal cortex. The current paper seems quite similar conceptually and in design to previous work, most notably Park et al., 2021, Nature Neuroscience.

      (1) The authors claim that this study provides evidence that humans use a spatial / grid code for abstract knowledge like social knowledge.

      This data does specifically not add anything new to this argument. As with almost all studies that test for a grid code in a similar "conceptual" space (not only the current study), the problem is that, when the space is not a uniform, square/circular space, and 2-dimensional then there is no reason the code will be perfectly grid like, i.e., show six-fold symmetry. In real world scenarios of social space (as well as navigation, semantic concepts), it must be higher dimensional - or at least more than two dimensional. It is unclear if this generalizes to larger spaces where not all part of the space is relevant. Modelling work from Tim Behrens' lab (e.g., Whittington et al., 2020) and Bradley Love's lab (e.g., Mok & Love, 2019) have shown/argued this to be the case. In experimental work, like in mazes from the Mosers' labs (e.g., Derdikman et al., 2009), or trapezoid environments from the O'Keefe lab (Krupic et al., 2015), there are distortions in mEC cells, and would not pass as grid cells in terms of the six-fold symmetry criterion.

      The authors briefly discuss the limitations of this at the very end but do not really say how this speaks to the goal of their study and the claim that social space or knowledge is organized as a grid code and if it is in fact used in the brain in their study and beyond. This issue deserves to be discussed in more depth, possibly referring to prior work that addressed this, and raise the issue for future work to address the problem - or if the authors think it is a problem at all.

      Thanks very much again for your careful re-evaluation and comments. We have tried to incorporate some of the suggested papers into our discussion. In summary, we agree that there is more to six-fold symmetric code that can be utilized to represent “conceptual space”. We think that the next step for a stronger claim would be to find the representation of more spontaneous non-spatial maps.

      References

      Bao, X., Gjorgieva, E., Shanahan, L. K., Howard, J. D., Kahnt, T., & Gottfried, J. A. (2019). Grid-like Neural Representations Support Olfactory Navigation of a Two-Dimensional Odor Space. Neuron, 102(5), 1066-1075 e1065. https://doi.org/10.1016/j.neuron.2019.03.034

      Constantinescu, A. O., O'Reilly, J. X., & Behrens, T. E. J. (2016). Organizing conceptual knowledge in humans with a gridlike code. Science, 352(6292), 1464-1468. https://doi.org/10.1126/science.aaf0941

      Park, S. A., Miller, D. S., & Boorman, E. D. (2021). Inferences on a multidimensional social hierarchy use a grid-like code. Nat Neurosci, 24(9), 1292-1301. https://doi.org/10.1038/s41593-02100916-3

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Although this manuscript contains a potentially interesting piece of work that delineates a mechanism of IQCH that associates with spermatogenesis, this reviewer feels that a number of issues require clarification and re-evaluation for a better understanding of the role of IQCH in spermatogenesis. With the shortage of logics and supporting data, causal relationships are still not clear among IQCH, CaM, and HNRPAB. The most serious point in this manuscript could be that the authors try to generalize their interpretations with too simplified model from limited pieces of their data. The way the data and the logic are presented needs to be largely revised, and several interpretations should be supported by direct evidence.

      Response: Thank you for the reviewer’s comment. IQCH is a calmodulin-binding protein, and the binding of IQCH and CaM was confirmed by LC-MS/MS analysis and co-IP assay using sperm lysate. We thus speculated that if the interaction of IQCH and CaM might be a prerequisite for IQCH function. To prove that speculation, we took HNRPAB as an example. We knocked down IQCH in cultured cells, and a decrease in the expression of HNRPAB was observed. Similarly, when we knocked down CaM in cultured cells, and a decrease in the expression of HNRPAB was also detected. However, these results cannot exclude that IQCH or CaM could regulate HNRPAB expression alone. To investigate that if IQCH or CaM could regulate HNRPAB expression alone, we overexpressed IQCH in cells that knocked down CaM, while the expression of HNRPAB cannot be rescued, suggesting that IQCH cannot regulate HNRPAB expression when CaM is reduced. In consistent, we overexpressed CaM in cells that knocked down IQCH, while the expression of HNRPAB cannot be rescued, suggesting that CaM cannot regulate HNRPAB expression when IQCH is reduced. Thus, IQCH or CaM cannot regulate HNRPAB expression alone. Moreover, we deleted the IQ motif of IQCH, which is required for binding to CaM. The co-IP results showed that the interaction of IQCH and CaM was disrupted when deleting the IQ motif of IQCH, and the expression of HNRPAB was decreased. Therefore, we suggested that the interaction of IQCH and CaM might be required for IQCH regulating HNRPAB. In future studies, we will further investigate the relationships among IQCH, CaM, and HNRPAB.

      Reviewer #3 (Public Review):

      (1) More background details are needed regarding the proteins involved, in particular IQ proteins and calmodulin. The authors state that IQ proteins are not well-represented in the literature, but do not state how many IQ proteins are encoded in the genome. They also do not provide specifics regarding which calmodulins are involved, since there are at least 5 family members in mice and humans. This information could help provide more granular details about the mechanism to the reader and help place the findings in context.

      Response: Thanks to reviewer’s suggestion. We have provided additional background information regarding IQ-containing protein family members in humans and mice, as well as other IQ-containing proteins implicated in male fertility, in the Introduction section. Furthermore, we have supplemented the Introduction with background information concerning the association between CaM and male infertility.

      (2) The mouse fertility tests could be improved with more depth and rigor. There was no data regarding copulatory plug rate; data was unclear regarding how many WT females were used for the male breeding tests and how many litters were generated; the general methodology used for the breeding tests in the Methods section was not very explicitly or clearly described; the sample size of n=3 for the male breeding tests is rather small for that type of assay; and, given that ICHQ appears to be expressed in testicular interstitial cells (Fig. S10) and somewhat in other organs (Fig. S2), another important parameter of male fertility that should be addressed is reproductive hormone levels (e.g., LH, FSH, and testosterone). While normal epididymal size in Fig. S3 suggests that hormone (testosterone) levels are normal, epididymal size and/or weight were not rigorously quantified.

      Response: Thanks to reviewer’s comment. We have provided the data regarding copulatory plug rate and the average number of litters for breeding tests in revised Figure 3—figure supplement 2. The methodology used for the breeding tests has been revised to be more detailed and explicit in the revised Method section. Moreover, we have increased the sample size for male breeding tests to n=6. We measured the serum levels of FSH, LH, and Testosterone in the WT (9.3±1.9 ng/ml, 0.93±0.15 ng/ml, and 0.2±0.03 ng/ml) and Iqch KO mice (12±2 ng/ml, 1.17±0.2 ng/ml, and 0.2±0.04 ng/ml). There was no significant difference observed in the serum levels of reproductive hormones between WT and Iqch KO mice; therefore, we did not include the data in the study. Furthermore, we have added quantitative data on epididymal size in the revised Figure 3—figure supplement 2.

      (3) The Western blots in Figure 6 should be rigorously quantified from multiple independent experiments so that there is stronger evidence supporting claims based on those assays.

      Response: We appreciate the reviewer's comment. As suggested, we have added quantified data in Figure 6—figure supplement 2 from the results of Western blotting in Figure 6.

      (4) Some of the mouse testis images could be improved. For example, the PNA and PLCz images in Figure S7 are difficult to interpret in that the tubules do not appear to be stage-matched, and since the authors claimed that testicular histology is unaffected in knockout testes, it should be feasible to stage-match control and knockout samples. Also, the anti-ICHQ and CaM immunofluorescence in Figure S10 would benefit from some cell-type-specific co-stains to more rigorously define their expression patterns, and they should also be stage-matched.

      Response: Thanks to reviewer’s suggestions. We have included immunofluorescence images of anti-PLCz, anti-PNA and anti-IQCH and CaM during spermatogenesis development.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) There are multiple grammatical errors and statements drawn beyond the results. The entire manuscript would benefit from professional editing.

      Response: We are sorry for the grammatical errors. We have enlisted professional editing services to refine our manuscript.

      (2) Line 40, "Firstly" is not appropriate here.

      Response: Thanks to reviewer’s comment. The word "Firstly" has been removed from the revised manuscript.

      (3) Line 44, "processes".

      Response: Thanks to reviewer’s suggestion. We have changed “process” in to “processes” on line 45.

      (4) "spermatocytogenesis (mitosis)" is incorrect.

      Response: Thanks to reviewer’s comment. We have changed “spermatocytogenesis (mitosis)” in to “mitosis” on line 47.

      (5) Ca and Ca2+ are both used in line 67 - 77. Be consistent.

      Response: We appreciate the reviewer's detailed checks. We have maintained consistency by revising instances of "Ca" to "Ca2+" in revised manuscript.

      (6) Line 238 to 240, "To elucidate the molecular mechanism by which IQCH regulates male fertility, we performed liquid chromatography tandem mass spectrometry (LC-MS/MS) analysis using mouse sperm lysates and detected 288 interactors of IQCH (Data S1)."It is not clear how LC-MS/MS using mouse sperm lysates could detect "288 interactors of IQCH"? A co-IP experiment for IQCH using sperm lysates prior to LC-MS/MS is needed to detect "interactors of IQCH". However, in the Methods section, consistent with the main text, proteomic quantification was conducted for protein extract from sperm. Figure legend for Fig. 5 did not explain this, either.Thus, it is unable to evaluate Figure 5.

      Response: We sincerely apologize for the oversight. Following reviewer’s suggestions, we have supplemented the method details of LC-MS/MS experiment in the Methods section of revised manuscript. Additionally, we conducted a co-IP experiment for IQCH using sperm lysates prior to LC-MS/MS and we did not include the corresponding figure in the manuscript. The results are as follows:

      Author response image 1.

      The results of a co-IP experiment for IQCH using sperm lysates from WT mice.

      (7) Line 246, "... key proteins that might be activated by IQCH". What does "activated" here refer to? Should it be "upregulated"?

      Response: We are sorry to our inexact statement. Instead, "upregulated" would better convey the intended meaning. According to reviewer’s suggestions, we have modified "activated" into "upregulated".

      (8) Line 252 to 254, "the cross-analysis revealed that 76 proteins were shared between the IQCH-bound proteins and the IQCH-activated proteins (Fig. 5E), implicating this subset of genes as direct targets." This is a confusing statement. Is the author trying to say, IQCH-bound proteins have upregulated expression, suggesting that IQCH enhances their expression?

      Response: We appreciate the reviewer's comment regarding the clarity of the statement in Line 252 to 254 of the manuscript. We have modified this sentence into “Importantly, cross-analysis revealed that 76 proteins were shared between the IQCH-bound proteins and the downregulated proteins in Iqch KO mice (Figure 5E), suggesting that IQCH might regulate their expression by the interaction.”

      (9) Line 260 to 261, "SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB ... the loss of which showed the greatest influence on the phenotype of the Iqch KO mice." There is no evidence suggesting that the loss of SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB leads to Iqch KO phenotype.

      Response: We apologize for our inaccurate statement. According to the literature, Fus KO, Ewsr1 KO, and Hnrnpk KO male mice were infertile, showing the spermatogenic arrest with absence of spermatozoa (Kuroda et al. 2000; Tian et al. 2021; Xu et al. 2022). Syncrip is involved meiotic process in Drosophila by interacting with Doublefault (Sechi et al. 2019). HNRPAB might be associated with mouse spermatogenesis by binding to Protamine 2 and contributing its translational regulation. Specifically, ANXA7 is a calcium-dependent phospholipid-binding protein that is a negative regulator of mitochondrial apoptosis (Du et al. 2015). Loss of SLC25A4 results in mitochondrial energy metabolism defects in mice (Graham et al. 1997). Moreover, RNA immunoprecipitation on formaldehyde cross-linked sperm followed by qPCR detected the interactions between HNRPAB and Catsper1, Catsper2, Catsper3, Ccdc40, Ccdc39, Ccdc65, Dnah8, Irrc6, and Dnhd1, which are essential for sperm development (Fukuda et al. 2013). Our Iqch KO mice showed abnormal sperm count, motility, morphology, and mitochondria, so we inferenced that IQCH might play a role in spermatogenesis by regulating the expression of SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB to some extent. We have changed an appropriate stamen that “We focused on SYNCRIP, HNRNPK, FUS, EWSR1, ANXA7, SLC25A4, and HNRPAB, which play important roles in spermatogenesis.”

      (10) Fig. 6C and 6D use different styles of error bars.

      Response: We are sorry for our oversight. In accordance with the reviewer's recommendations, we have modified the representation of error bars in the revised Fig. 6C.

      (11) Line 296 to 297, "As expected, CaM interacted with IQCH, as indicated by LC-MS/MS analysis". It is not clear how LC-MS/MS detects protein interaction.

      Response: As reviewer’s suggestions, we have supplemented the method details of LC-MS/MS experiment in the Methods section of revised manuscript. The results of proteins interacting with IQCH in sperm lysates from the LC-MS/MS experiment analysis were submitted as Figure 5—source data 1.

      (12) It is still not clear how the interaction between IQCH, CaM, and HNRPAB is required for the expression of each other.

      Response: Thank you for the reviewer’s comment. IQCH is a calmodulin-binding protein, and the binding of IQCH and CaM was confirmed by LC-MS/MS analysis and co-IP assay using sperm lysate. We thus speculated that if the interaction of IQCH and CaM might be a prerequisite for IQCH function. To prove that speculation, we took HNRPAB as an example. We knocked down IQCH in cultured cells, and a decrease in the expression of HNRPAB was observed. Similarly, when we knocked down CaM in cultured cells, and a decrease in the expression of HNRPAB was also detected. However, these results cannot exclude that IQCH or CaM could regulate HNRPAB expression alone. To investigate that if IQCH or CaM could regulate HNRPAB expression alone, we overexpressed IQCH in cells that knocked down CaM, while the expression of HNRPAB cannot be rescued, suggesting that IQCH cannot regulate HNRPAB expression when CaM is reduced. In consistent, we overexpressed CaM in cells that knocked down IQCH, while the expression of HNRPAB cannot be rescued, suggesting that CaM cannot regulate HNRPAB expression when IQCH is reduced. Thus, IQCH or CaM cannot regulate HNRPAB expression alone. Moreover, we deleted the IQ motif of IQCH, which is required for binding to CaM. The co-IP results showed that the interaction of IQCH and CaM was disrupted when deleting the IQ motif of IQCH, and the expression of HNRPAB was decreased. Therefore, we suggested that the interaction of IQCH and CaM might be required for IQCH regulating HNRPAB. In future studies, we will further investigate the relationships among IQCH, CaM, and HNRPAB.

      Reviewer #3 (Recommendations For The Authors):

      The authors have addressed my minor concerns. However, they neglected to address any of my more significant concerns in the public review. I assume that they simply overlooked these critiques, despite the fact that eLife explicitly states that "...as a general rule, concerns about a claim not being justified by the data should be explained in the public review." Therefore, the authors should have looked more carefully at the public reviews. As a result, my major concerns about the manuscript remain.

      Response: We apologize for overlooking the public review process. We have improved our study based on the feedback received during the public review.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study advances our understanding of why diabetes is a risk factor for more severe Covid-19 disease. The authors offer solid evidence that cathepsin L is more active in diabetic individuals, that this higher activity is recapitulated at the cellular level in the presence of high glucose, and that high glucose leads to higher cathepsin L maturation. While not all aspects of the relationship between diabetes and cathepsin L (e.g., effects of metabolic acidosis) have been investigated, the work should be of interest to researchers in diabetes, virology, and immunology.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by He et al. investigates the relationship of an increased susceptibility of diabetes patients to COVID-19. The paper raises the possibility that hyperglycemia-induced cathepsin L maturation could be one of the driving forces in this pathology, suggesting that an increased activity of CTSL leads to accelerated virus infection rates due to an elevated processing of the SARS-CoV-2 spike protein.

      In a clinical case-control study, the team found that the severity of corona infections was higher in diabetic patients, and their CTSL levels correlated well with the progression of the disease. They further showed an increase in CTSL activity in the long term as well as acute hyperglycemia. SARS-CoV-2 increasingly infected cells that were cultured in serum from diabetic patients, the same was observed using high glucose medium. No effect was observed in the medium with increased concentrations of insulin. CTSL knockout abolished the glucose-dependent increase in infection.

      Increased glucose levels did not correlate with an increase in CTSL transcription. Rather He et al. could show that high glucose levels led to CTSL translocation from the ER into the lysosome. It was the glucose-dependent processing of the protease to its active form which promoted infection.

      Strengths:

      It is a complete study starting from a clinical observation and ending on the molecular mechanism. A strength is certainly the wide selection of experiments. The clinical study to investigate the effect of glucose on CTSL concentrations in healthy individuals sets the stage for experiments in cell culture, animal models, and human tissue. The effect of CTSL knockout cell lines on glucose-induced SARS-CoV2 infection rates is convincing. Finally, the team used a combination of Western blots and confocal microscopy to identify the underlying molecular mechanisms. The authors manage to keep the diabetic condition at the center of their study and therefore extend on previous knowledge of glucose-induced CTSL activation and their consequences for COVID-19 infections. By doing so, they create a novel connection between CTSL involvement in SARS-CoV2 infections and diabetes.

      Weaknesses:

      (1) The authors suggest that hyperglycemia as a symptom of diabetes leads to an increased infection rate in those patients. Throughout their study, the team focuses on two select symptoms of a diabetic condition, hyperglycemia and hyperinsulinemia. The team acknowledges in the discussion that there could be various other reasons. Hyperglycemia can lead to metabolic acidosis and a shift in blood pH. As CTSL activity is highly dependent on pH, it would have been crucial to include this parameter in the study.

      We sincerely appreciate your valuable comment. We agree that hyperglycemia can lead to metabolic acidosis and alter blood pH. However, the normal range for blood pH in humans is relatively narrow, typically ranging from 7.35 to 7.45. In our study, we ensured that blood pH remained within this normal range for both diabetic and healthy control samples. To address your concern, we conducted experiments to investigate CTSL activity in response to pH fluctuations within this physiological range. The updated Fig. 4a now presents these findings, demonstrating consistent CTSL activity despite pH variations. Statistical analysis was performed using one-way ANOVA with Tukey’s post hoc test to ensure robustness. We have also amended the figure legend and provided corresponding descriptions in the final edition manuscript (line 15-18, page 7).

      Author response image 1.

      (2) The study rarely differentiates between cellular and extracellular CTSL activity. A more detailed explanation for the connection between the intracellular CTSL and serum CTSL in diabetic individuals, presumably via lysosomal exocytosis, could be helpful with regard to the final model to give a more complete picture.

      Thank you for your insightful comments. Previous studies have elucidated the process by which lysosomal CTSL is transported via vesicles and subsequently secreted from the cell membrane through exocytosis (references 1-5). To provide a more comprehensive understanding, we have incorporated this information on Fig. 6h, page 32 of the final edition manuscript. This addition aims to enhance clarity regarding the connection between intracellular and serum CTSL activity in diabetic individuals, particularly through lysosomal exocytosis.

      Author response image 2.

      References:

      (1) Reddy A et al. Plasma membrane repair is mediated by Ca(2+)-regulated exocytosis of lysosomes. Cell. 2001 Jul 27;106(2):157-69. doi: 10.1016/s0092-8674(01)00421-4. PMID: 11511344.

      (2) Hasanagic M et al. Different Pathways to the Lysosome: Sorting out Alternatives. Int Rev Cell Mol Biol. 2015;320:75-101. doi: 10.1016/bs.ircmb.2015.07.008. Epub 2015 Aug 19. PMID: 26614872.

      (3) Reiser J et al. Specialized roles for cysteine cathepsins in health and disease. J Clin Invest. 2010 Oct;120(10):3421-31. doi: 10.1172/JCI42918. Epub 2010 Oct 1. PMID: 20921628; PMCID: PMC2947230.

      (4) Jaiswal JK et al. Membrane proximal lysosomes are the major vesicles responsible for calcium-dependent exocytosis in nonsecretory cells. J Cell Biol. 2002 Nov 25;159(4):625-35. doi: 10.1083/jcb.200208154. Epub 2002 Nov 18. PMID: 12438417; PMCID: PMC2173094.

      (5) Coutinho MF et al. Mannose-6-phosphate pathway: a review on its role in lysosomal function and dysfunction. Mol Genet Metab. 2012 Apr;105(4):542-50. doi: 10.1016/j.ymgme.2011.12.012. Epub 2011 Dec 23. PMID: 22266136.

      (3) In the early result section, an effect of hyperglycemia on total CTSL concentrations is described, but the data is not very convincing. Over the course of the manuscript, the hypothesis shifts increasingly towards an increase in protease trans-localization and processing to the active form rather than a change in total protease amounts. The overall importance of CTSL concentrations remains questionable.

      Thank you for your insightful feedback. We have addressed your concerns regarding the impact of hyperglycemia on CTSL concentrations. Fig. 2h-j illustrate the effect of acute hyperglycemia on both CTSL concentration and activity in 15 healthy male volunteers over a 160-minute period. During this short timeframe, CTSL concentration remained stable, as evidenced by consistent RNA results from cells exposed to varying glucose levels (Supplementary Fig.1). However, there was a significant increase in CTSL activity, indicating that glucose elevation rapidly triggers CTSL maturation through propeptide cleavage. This activation process occurs more rapidly than CTSL protein synthesis. In summary, acute hyperglycemia specifically elevates CTSL activity, while chronic hyperglycemia may impact both CTSL activity and concentration (Fig. 2a-d). Additionally, Tournu C, et al. (1998) (reference 1) and Shi Q, et al. (2018) (reference 2) have reported that increased glucose metabolism promotes the maturation and secretion of CTSL and other proteases. These findings align with our evidence that hyperglycemia drives CTSL maturation, as discussed at line 10-25, page 12 in the final edition manuscript.

      References:

      (1) Tournu C et al. Glucose controls cathepsin expression in Ras-transformed fibroblasts. Arch Biochem Biophys. 1998 Dec 1;360(1):15-24. doi: 10.1006/abbi.1998.0916. PMID: 9826424.

      (2) Shi Q et al. Increased glucose metabolism in TAMs fuels O-GlcNAcylation of lysosomal Cathepsin B to promote cancer metastasis and chemoresistance. Cancer Cell. 2022 Oct 10;40(10):1207-1222.e10. doi: 10.1016/j.ccell.2022.08.012. Epub 2022 Sep 8. PMID: 36084651.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors hypothesized that individuals with diabetes have elevated blood CTSL levels, which facilitates SARS-CoV-2 infection. The authors conducted in vitro experiments, revealing that elevated glucose levels promote SARS-CoV-2 infection in wild-type cells. In contrast, CTSL knockout cells show reduced susceptibility to high glucose-promoted effects. Additionally, the authors utilized lung tissue samples obtained from both diabetic and non-diabetic patients, along with db/db diabetic and control mice. Their findings indicate that diabetic conditions lead to an elevation in CTSL activity in both humans and mice.

      Strengths:

      The authors have effectively met their research objectives, and their conclusions are supported by the data presented. Their findings suggest that high glucose levels promote CTSL maturation and translocation from the endoplasmic reticulum to the lysosome, potentially contributing to diabetic comorbidities and complications.

      Weaknesses:

      (1) In Figure 1e, the authors measured plasma levels of COVID-19 related proteins, including ACE2, CTSL, and CTSB, in both diabetic and non-diabetic COVID-19 patients. Notably, only CTSL levels exhibited a significant increase in diabetic patients compared to non-diabetic patients, and these levels varied throughout the course of COVID-19. Given that the diabetes groups encompass both male and female patients, it is essential to ascertain whether the authors considered the potential impact of gender on CTSL levels. The diabetes groups comprised a higher percentage of male patients (61.3%) compared to the non-diabetes group, where males constituted only 38.7%.

      Thank you for your insightful feedback. In response to your concerns regarding the potential impact of gender on CTSL levels in diabetic and non-diabetic COVID-19 patients, we conducted analyses to address this issue. While our initial study involved 62 COVID-19 patients, with 31 having diabetes and 31 without, matching based on gender and age, we acknowledged the challenge of obtaining balanced gender distribution in both groups due to the difficulty of collecting blood samples from COVID-19 patients. To mitigate potential gender bias resulting from small sample sizes, we conducted a supplementary clinical study involving 122 non-COVID-19 volunteers, including 61 individuals with diabetes and 61 without. The percentage of males in the diabetes group was 50.8%, while in the healthy group, males constituted 44.3% (P value = 0.468), indicating no significant gender bias. We have incorporated this information into the discussion section on line 4-13, page 11 in the final edition manuscript, to provide clarity on this aspect of our study.

      (2) Lines 145-149: "The results showed that WT Huh7 cell cultured in high glucose medium exhibited a much higher infective rate than those in low glucose medium. However, CTSL KO Huh7 cells maintained a low infective rate of SARS-CoV-2 regardless of glucose or insulin levels (Fig. 3f-h). Therefore, hyperglycemia enhanced SARS-CoV-2 infection dependent on CTSL." However, this evidence may be insufficient to support the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. The human hepatoma cell line Huh7 might not be an ideal model to validate the authors' hypothesis regarding high blood glucose promoting SARS-CoV-2 infection through CTSL.

      Thank you for your valuable feedback. We have addressed the concerns regarding the sufficiency of evidence supporting the claim that hyperglycemia enhances SARS-CoV-2 infection dependent on CTSL. Specifically, we have revised the expression to state, “Therefore, hyperglycemia enhanced SARS-CoV-2 infection through CTSL.” as suggested, in line 9, page 7 in the final edition manuscript. Additionally, we acknowledge the potential involvement of other bioactive factors, such as 1,5-anhydro-D-glucitol (1,5-AG), in mediating SARS-CoV-2 infection in patients with diabetes, as outlined in the discussion section from line 13-21, page 13 in the final edition manuscript.

      Regarding the choice of the human hepatoma cell line Huh7 as a model for investigating hyperglycemia-induced CTSL maturation and SARS-CoV-2 infection, we recognize the importance of tissue specificity and the liver’s significance as a target organ for COVID-19. Despite potential limitations, such as generalization of liver function abnormalities and lack of tissue specificity in SARS-CoV-2 impact, Huh7 cells offer practical advantages as a mature cell model for studying SARS-CoV-2 infection, including accessibility, susceptibility to infection, and stable proliferation (reference 1-3). We have elaborated on these considerations in the discussion section at line 19-23, page 11 in the final edition manuscript, to provide context for our choice of experimental model.

      References:

      (1) Gupta A et al. Extrapulmonary manifestations of COVID-19. Nat Med. 2020 Jul;26(7):1017-1032. doi: 10.1038/s41591-020-0968-3. Epub 2020 Jul 10. PMID: 32651579.

      (2) Nie X et al. Multi-organ proteomic landscape of COVID-19 autopsies. Cell. 2021 Feb 4;184(3):775-791.e14. doi: 10.1016/j.cell.2021.01.004. Epub 2021 Jan 9. PMID: 33503446; PMCID: PMC7794601.

      (3) Ciotti M et al. The COVID-19 pandemic. Crit Rev Clin Lab Sci. 2020 Sep;57(6):365-388. doi: 10.1080/10408363.2020.1783198. Epub 2020 Jul 9. PMID: 32645276.

      (3) The Abstract and Introduction sections lack effective organization.

      Thank you for your valuable comments. We have rewritten the Abstract and Introduction sections and incorporated the updated descriptions in the final edition manuscript.

      Reviewer #1 (Recommendations For The Authors):

      (1) When referring to diabetes, does this exclusively include diabetes type 2?

      Thank you for your inquiry. In our study, the term “diabetes” encompasses the condition of hyperglycemia in a broad sense, rather than specifically indicating type 1 diabetes (T1DM) or type 2 diabetes (T2DM). This broader definition aligns with the scope of our research objectives and findings, particularly observed in the cell experiments conducted. We have clarified this point in the revised discussion section, from line 6-9, page 12 in the final edition manuscript, to provide additional context for readers.

      (2) The titles of the individual paragraphs are not very strong and descriptive. More precise titles help to structure the paper better for the reader.

      Thank you for your valuable comments. We have rewritten the title of each section to make it more precise for readers and incorporated the updated descriptions in the manuscript.

      (3) Fig.3c, adding a 0 nM insulin control would be nice.

      Thank you for your suggestion. We have revised Fig.3c according to your advice. The revised figure was located at page 29 in the final edition manuscript. The corresponding figure legend has also been revised.

      Author response image 3.

      (4) Fig.3e non-infection control would be nice.

      Thank you for your suggestion. We have incorporated your feedback by adding a non-infection control in Fig. 3e. In this revised figure, we included a measurement of SARS-CoV-2 pseudovirus infection assessed through the fluorescence captured by a reader. Cells infected by the pseudovirus exhibited activation of the firefly luciferase, resulting in the release of fluorescence. Conversely, non-infected control cells showed no fluorescence, with the reader recording a value of zero. The updated figure can now be found on page 29 in the final edition manuscript, and we have adjusted the corresponding figure legend accordingly.

      Author response image 4.

      (5) In Figure 5, the processing of CTSL in cells (b-c) strongly differs from processing in tissue (d-e) focusing on amounts of dc-mCTSL. Do you have an explanation for this? Overall, blots are hard to judge by eye and it would be nice to include blots with shorter exposure.

      Thank you for your insightful feedback. The differences observed in the processing of CTSL between cells (Fig. 5b) and tissues (Fig. 5d-e) may be attributed to the complexities inherent in tissue samples, which can impact the clarity of the images. Furthermore, in human tissue samples, it is pertinent to consider that patients in the diabetes group had their blood glucose levels controlled within or near the normal range prior to lung surgery. As a result, the evidence supporting CTSL maturation in human lung tissue blotting images may be less compelling. We have addressed this aspect in the revised results section (lines 10-13, page 9). Additionally, we will consider including blots with shorter exposure to enhance visual clarity in future studies.

      (6) Considering Fig2B and Figure S1, the evidence of an effect of hyperglycemia or high glucose medium on total CTSL protein concentration is not very strong. In my opinion, this claim in the results section for Fig2 should be revisited.

      Thank you for your valuable suggestion. We have revisited the section in question and made appropriate revisions. The original sentence has been modified to accurately reflect the findings: "We found that plasma CTSL activity was strongly positively correlated with chronic hyperglycemia indicated by HbA1c and was significantly higher in diabetic patients than in euglycemic individuals (Fig. 2a, c). Additionally, plasma CTSL concentration showed a positive trend with chronic hyperglycemia indicated by HbA1c (Fig. 2b, d)". These changes have been incorporated into the revised results section (lines 12-16, page 5).

      (7) Overall, data hinting to increased CTSL activity is stronger than protein amount. This being said, in hyperglycemia, blood pH can be affected (metabolic acidosis). As CTSL has higher activity at low pH, could the increase in activity be caused by a drop in pH? Can you include this aspect in your manuscript? For example, is there a pH difference in serum of nondiabetic vs diabetic patients?

      Thank you for your valuable input. We have already addressed the potential impact of pH changes on CTSL activity in our response to Weakness No. 1. As indicated, although hyperglycemia can lead to metabolic acidosis and changes in blood pH, the pH levels observed in our study remained within the normal range (7.35 to 7.45). Therefore, we conducted experiments to investigate CTSL activity in response to changes in pH, which showed consistent activity levels within this range. This information has been included in our revised manuscript (line 15-18, page 7).

      Reviewer #2 (Recommendations For The Authors):

      (1) The Abstract and Introduction sections lack effective organization. The manuscript's style resembles that of Cell Journal rather than aligning with the customary format of eLife.

      Thank you for your valuable comments. The Abstract and Introduction sections have been reorganized to be more precise for readers has been included in our revised manuscript. Additionally, we have meticulously updated the manuscript's style to align with the standard format of eLife in our revised manuscript, especially key resources table of materials and methods sections.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Strengths:

      The study was designed as a 6-month follow-up, with repeated behavioral and EEG measurements through disease development, providing valuable and interesting findings on AD progression and the effect of early-life choline supplantation. Moreover, the behavioral data that suggest an adverse effect of low choline in WT mice are interesting and important beyond the context of AD.

      Thank you for identifying several strengths.

      Weaknesses:

      (1) The multiple headings and subheadings, focusing on the experimental method rather than the narrative, reduce the readability.

      We have reduced the number of headings.

      (2) Quantification of NeuN and FosB in WT littermates is needed to demonstrate rescue of neuronal death and hyperexcitability by high choline supplementation and also to gain further insights into the adverse effect of low choline on the performance of WT mice in the behavioral test.

      We agree and have added WT data for the NeuN and ΔFosB analyses. These data are included in the text and figures. For NeuN, the Figure is Figure 6. For ΔFosB it is Figure 7. In brief, the high choline diet restored NeuN and ΔFosB to the levels of WT mice.

      Below is Figure 6 and its legend to show the revised presentation of data for NeuN. Afterwards is the revised figure showing data for ΔFosB. After that are the sections of the Results that have been revised.

      Author response image 1.

      Choline supplementation improved NeuN immunoreactivity (ir) in hilar cells in Tg2576 animals. A. Representative images of NeuN-ir staining in the anterior DG of Tg2576 animals. (1) A section from a Tg2576 mouse fed the low choline diet. The area surrounded by a box is expanded below. Red arrows point to NeuN-ir hilar cells. Mol=molecular layer, GCL=granule cell layer, HIL=hilus. Calibration for the top row, 100 µm; for the bottom row, 50 µm. (2) A section from a Tg2576 mouse fed the intermediate diet. Same calibrations as for 1. (3) A section from a Tg2576 mouse fed the high choline diet. Same calibrations as for 1. B. Quantification methods. Representative images demonstrate the thresholding criteria used to quantify NeuN-ir. (1) A NeuN-stained section. The area surrounded by the white box is expanded in the inset (arrow) to show 3 hilar cells. The 2 NeuN-ir cells above threshold are marked by blue arrows. The 1 NeuN-ir cell below threshold is marked by a green arrow. (2) After converting the image to grayscale, the cells above threshold were designated as red. The inset shows that the two cells that were marked by blue arrows are red while the cell below threshold is not. (3) An example of the threshold menu from ImageJ showing the way the threshold was set. Sliders (red circles) were used to move the threshold to the left or right of the histogram of intensity values. The final position of the slider (red arrow) was positioned at the onset of the steep rise of the histogram. C. NeuN-ir in Tg2576 and WT mice. Tg2576 mice had either the low, intermediate, or high choline diet in early life. WT mice were fed the standard diet (intermediate choline). (1) Tg2576 mice treated with the high choline diet had significantly more hilar NeuN-ir cells in the anterior DG compared to Tg2576 mice that had been fed the low choline or intermediate diet. The values for Tg2576 mice that received the high choline diet were not significantly different from WT mice, suggesting that the high choline diet restored NeuN-ir. (2) There was no effect of diet or genotype in the posterior DG, probably because the low choline and intermediate diet did not appear to lower hilar NeuN-ir.

      Author response image 2.

      Choline supplementation reduced ∆FosB expression in dorsal GCs of Tg2576 mice. A. Representative images of ∆FosB staining in GCL of Tg2576 animals from each treatment group. (1) A section from a low choline-treated mouse shows robust ∆FosB-ir in the GCL. Calibration, 100 µm. Sections from intermediate (2) and high choline (3)-treated mice. Same calibration as 1. B. Quantification methods. Representative images demonstrating the thresholding criteria established to quantify ∆FosB. (1) A ∆FosB -stained section shows strongly-stained cells (white arrows). (2) A strict thresholding criteria was used to make only the darkest stained cells red. C. Use of the strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice treated with the choline supplemented diet had significantly less ∆FosB-ir compared to the Tg2576 mice fed the low or intermediate diets. Tg2576 mice fed the high choline diet were not significantly different from WT mice, suggesting a rescue of ∆FosB-ir. (2) There were no significant differences in ∆FosB-ir in posterior sections. D. Methods are shown using a threshold that was less strict. (1) Some of the stained cells that were included are not as dark as those used for the strict threshold (white arrows). (2) All cells above the less conservative threshold are shown in red. E. Use of the less strict threshold to quantify ∆FosB-ir. (1) Anterior DG. Tg2576 mice that were fed the high choline diet had less ΔFosB-ir pixels than the mice that were fed the other diets. There were no differences from WT mice, suggesting restoration of ∆FosB-ir by choline enrichment in early life. (2) Posterior DG. There were no significant differences between Tg2576 mice fed the 3 diets or WT mice.

      Results, Section C1, starting on Line 691:

      “To ask if the improvement in NeuN after MCS in Tg256 restored NeuN to WT levels we used WT mice. For this analysis we used a one-way ANOVA with 4 groups: Low choline Tg2576, Intermediate Tg2576, High choline Tg2576, and Intermediate WT (Figure 5C). Tukey-Kramer multiple comparisons tests were used as the post hoc tests. The WT mice were fed the intermediate diet because it is the standard mouse chow, and this group was intended to reflect normal mice. The results showed a significant group difference for anterior DG (F(3,25)=9.20; p=0.0003; Figure 5C1) but not posterior DG (F(3,28)=0.867; p=0.450; Figure 5C2). Regarding the anterior DG, there were more NeuN-ir cells in high choline-treated mice than both low choline (p=0.046) and intermediate choline-treated Tg2576 mice (p=0.003). WT mice had more NeuN-ir cells than Tg2576 mice fed the low (p=0.011) or intermediate diet (p=0.003). Tg2576 mice that were fed the high choline diet were not significantly different from WT (p=0.827).”

      Results, Section C2, starting on Line 722:

      “There was strong expression of ∆FosB in Tg2576 GCs in mice fed the low choline diet (Figure 7A1). The high choline diet and intermediate diet appeared to show less GCL ΔFosB-ir (Figure 7A2-3). A two-way ANOVA was conducted with the experimental group (Tg2576 low choline diet, Tg2576 intermediate choline diet, Tg2576 high choline diet, WT intermediate choline diet) and location (anterior or posterior) as main factors. There was a significant effect of group (F(3,32)=13.80, p=<0.0001) and location (F(1,32)=8.69, p=0.006). Tukey-Kramer post-hoc tests showed that Tg2576 mice fed the low choline diet had significantly greater ΔFosB-ir than Tg2576 mice fed the high choline diet (p=0.0005) and WT mice (p=0.0007). Tg2576 mice fed the low and intermediate diets were not significantly different (p=0.275). Tg2576 mice fed the high choline diet were not significantly different from WT (p>0.999). There were no differences between groups for the posterior DG (all p>0.05).”

      “∆FosB quantification was repeated with a lower threshold to define ∆FosB-ir GCs (see Methods) and results were the same (Figure 7D). Two-way ANOVA showed a significant effect of group (F(3,32)=14.28, p< 0.0001) and location (F(1,32)=7.07, p=0.0122) for anterior DG but not posterior DG (Figure 7D). For anterior sections, Tukey-Kramer post hoc tests showed that low choline mice had greater ΔFosB-ir than high choline mice (p=0.0024) and WT mice (p=0.005) but not Tg2576 mice fed the intermediate diet (p=0.275); Figure 7D1). Mice fed the high choline diet were not significantly different from WT (p=0.993; Figure 7D1). These data suggest that high choline in the diet early in life can reduce neuronal activity of GCs in offspring later in life. In addition, low choline has an opposite effect, suggesting low choline in early life has adverse effects.”

      (3) Quantification of the discrimination ratio of the novel object and novel location tests can facilitate the comparison between the different genotypes and diets.

      We have added the discrimination index for novel object location to the paper. The data are in a new figure: Figure 3. In brief, the results for discrimination index are the same as the results done originally, based on the analysis of percent of time exploring the novel object.

      Below is the new Figure and legend, followed by the new text in the Results.

      Author response image 3.

      Novel object location results based on the discrimination index. A. Results are shown for the 3 months-old WT and Tg2576 mice based on the discrimination index. (1) Mice fed the low choline diet showed object location memory only in WT. (2) Mice fed the intermediate diet showed object location memory only in WT. (3) Mice fed the high choline diet showed memory both for WT and Tg2576 mice. Therefore, the high choline diet improved memory in Tg2576 mice. B. The results for the 6 months-old mice are shown. (1-2) There was no significant memory demonstrated by mice that were fed either the low or intermediate choline diet. (3) Mice fed a diet enriched in choline showed memory whether they were WT or Tg2576 mice. Therefore, choline enrichment improved memory in all mice.

      Results, Section B1, starting on line 536:

      “The discrimination indices are shown in Figure 3 and results led to the same conclusions as the analyses in Figure 2. For the 3 months-old mice (Figure 3A), the low choline group did not show the ability to perform the task for WT or Tg2576 mice. Thus, a two-way ANOVA showed no effect of genotype (F(1,74)=0.027, p=0.870) or task phase (F(1,74)=1.41, p=0.239). For the intermediate diet-treated mice, there was no effect of genotype (F(1,50)=0.3.52, p=0.067) but there was an effect of task phase (F(1,50)=8.33, p=0.006). WT mice showed a greater discrimination index during testing relative to training (p=0.019) but Tg2576 mice did not (p=0.664). Therefore, Tg2576 mice fed the intermediate diet were impaired. In contrast, high choline-treated mice performed well. There was a main effect of task phase (F(1,68)=39.61, p=<0.001) with WT (p<0.0001) and Tg2576 mice (p=0.0002) showing preference for the moved object in the test phase. Interestingly, there was a main effect of genotype (F(1,68)=4.50, p=0.038) because the discrimination index for WT training was significantly different from Tg2576 testing (p<0.0001) and Tg2576 training was significantly different from WT testing (p=0.0003).”

      “The discrimination indices of 6 months-old mice led to the same conclusions as the results in Figure 2. There was no evidence of discrimination in low choline-treated mice by two-way ANOVA (no effect of genotype, (F(1,42)=3.25, p=0.079; no effect of task phase, F(1,42)=0.278, p=0.601). The same was true of mice fed the intermediate diet (genotype, F(1,12)=1.44, p=0.253; task phase, F(1,12)=2.64, p=0.130). However, both WT and Tg2576 mice performed well after being fed the high choline diet (effect of task phase, (F(1,52)=58.75, p=0.0001, but not genotype (F(1,52)=1.197, p=0.279). Tukey-Kramer post-hoc tests showed that both WT (p<0.0001) and Tg2576 mice that had received the high choline diet (p=0.0005) had elevated discrimination indices for the test session.”

      (4) The longitudinal analyses enable the performance of multi-level correlations between the discrimination ratio in NOR and NOL, NeuN and Fos levels, multiple EEG parameters, and premature death. Such analysis can potentially identify biomarkers associated with AD progression. These can be interesting in different choline supplementation, but also in the standard choline diet.

      We agree and added correlations to the paper in a new figure (Figure 9). Below is Figure 9 and its legend. Afterwards is the new Results section.

      Author response image 4.

      Correlations between IIS, Behavior, and hilar NeuN-ir. A. IIS frequency over 24 hrs is plotted against the preference for the novel object in the test phase of NOL. A greater preference is reflected by a greater percentage of time exploring the novel object. (1) The mice fed the high choline diet (red) showed greater preference for the novel object when IIS were low. These data suggest IIS impaired object location memory in the high choline-treated mice. The low choline-treated mice had very weak preference and very few IIS, potentially explaining the lack of correlation in these mice. (2) There were no significant correlations for IIS and NOR. However, there were only 4 mice for the high choline group, which is a limitation. B. IIS frequency over 24 hrs is plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. (1) Hilar NeuN-ir is plotted against the preference for the novel object in the test phase of NOL. There were no significant correlations. (2) Hilar NeuN-ir was greater for mice that had better performance in NOR, both for the low choline (blue) and high choline (red) groups. These data support the idea that hilar cells contribute to object recognition (Kesner et al. 2015; Botterill et al. 2021; GoodSmith et al. 2022).

      Results, Section F, starting on Line 801:

      “F. Correlations between IIS and other measurements

      As shown in Figure 9A, IIS were correlated to behavioral performance in some conditions. For these correlations, only mice that were fed the low and high choline diets were included because mice that were fed the intermediate diet did not have sufficient EEG recordings in the same mouse where behavior was studied. IIS frequency over 24 hrs was plotted against the preference for the novel object in the test phase (Figure 9A). For NOL, IIS were significantly less frequent when behavior was the best, but only for the high choline-treated mice (Pearson’s r, p=0.022). In the low choline group, behavioral performance was poor regardless of IIS frequency (Pearson’s r, p=0.933; Figure 9A1). For NOR, there were no significant correlations (low choliNe, p=0.202; high choline, p=0.680) but few mice were tested in the high choline-treated mice (Figure 9B2).

      We also tested whether there were correlations between dorsal hilar NeuN-ir cell numbers and IIS frequency. In Figure 9B, IIS frequency over 24 hrs was plotted against the number of dorsal hilar cells expressing NeuN. The dorsal hilus was used because there was no effect of diet on the posterior hilus. For NOL, there was no significant correlation (low choline, p=0.273; high choline, p=0.159; Figure 9B1). However, for NOR, there were more NeuN-ir hilar cells when the behavioral performance was strongest (low choline, p=0.024; high choline, p=0.016; Figure 9B2). These data support prior studies showing that hilar cells, especially mossy cells (the majority of hilar neurons), contribute to object recognition (Botterill et al. 2021; GoodSmith et al. 2022).”

      We also noted that all mice were not possible to include because they died or other reasons, such a a loss of the headset (Results, Section A, Lines 463-464): Some mice were not possible to include in all assays either because they died before reaching 6 months or for other reasons.

      Reviewer #2 (Public Review):

      Strengths:

      The strength of the group was the ability to monitor the incidence of interictal spikes (IIS) over the course of 1.2-6 months in the Tg2576 Alzheimer's disease model, combined with meaningful behavioral and histological measures. The authors were able to demonstrate MCS had protective effects in Tg2576 mice, which was particularly convincing in the hippocampal novel object location task.

      We thank the Reviewer for identifying several strengths.

      Weaknesses:

      Although choline deficiency was associated with impaired learning and elevated FosB expression, consistent with increased hyperexcitability, IIS was reduced with both low and high choline diets. Although not necessarily a weakness, it complicates the interpretation and requires further evaluation.

      We agree and we revised the paper to address the evaluations that were suggested.

      Reviewer #1 (Recommendations For The Authors):

      (1) A reference directing to genotyping of Tg2576 mice is missing.

      We apologize for the oversight and added that the mice were genotyped by the New York University Mouse Genotyping core facility.

      Methods, Section A, Lines 210-211: “Genotypes were determined by the New York University Mouse Genotyping Core facility using a protocol to detect APP695.”

      (2) Which software was used to track the mice in the behavioral tests?

      We manually reviewed videos. This has been clarified in the revised manuscript. Methods, Section B4, Lines 268-270: Videos of the training and testing sessions were analyzed manually. A subset of data was analyzed by two independent blinded investigators and they were in agreement.

      (3) Unexpectedly, a low choline diet in AD mice was associated with reduced frequency of interictal spikes yet increased mortality and spontaneous seizures. The authors attribute this to postictal suppression.

      We did not intend to suggest that postictal depression was the only cause. It was a suggestion for one of many potential explanations why seizures would influence IIS frequency. For postictal depression, we suggested that postictal depression could transiently reduce IIS. We have clarified the text so this is clear (Discussion, starting on Line 960):

      If mice were unhealthy, IIS might have been reduced due to impaired excitatory synaptic function. Another reason for reduced IIS is that the mice that had the low choline diet had seizures which interrupted REM sleep. Thus, seizures in Tg2576 mice typically started in sleep. Less REM sleep would reduce IIS because IIS occur primarily in REM. Also, seizures in the Tg2576 mice were followed by a depression of the EEG (postictal depression; Supplemental Figure 3) that would transiently reduce IIS. A different, radical explanation is that the intermediate diet promoted IIS rather than low choline reducing IIS. Instead of choline, a constituent of the intermediate diet may have promoted IIS.

      However, reduced spike frequency is already evident at 5 weeks of age, a time point with a low occurrence of premature death. A more comprehensive analysis of EEG background activity may provide additional information if the epileptic activity is indeed reduced at this age.

      We did not intend to suggest that premature death caused reduced spike frequency. We have clarified the paper accordingly. We agree that a more in-depth EEG analysis would be useful but is beyond the scope of the study.

      (4) Supplementary Fig. 3 depicts far more spikes / 24 h compared to Fig. 7B (at least 100 spikes/24h in Supplementary Fig. 3 and less than 10 spikes/24h in Fig. 7B).

      We would like to clarify that before and after a seizure the spike frequency is unusually high. Therefore, there are far more spikes than prior figures.

      We clarified this issue by adding to the Supplemental Figure more data. The additional data are from mice without a seizure, showing their spikes are low in frequency.

      All recordings lasted several days. We included the data from mice with a seizure on one of the days and mice without any seizures. For mice with a seizure, we graphed IIS frequency for the day before, the day of the seizure, and the day after. For mice without a seizure, IIS frequency is plotted for 3 consecutive days. When there was a seizure, the day before and after showed high numbers of spikes. When there was no seizure on any of the 3 days, spikes were infrequent on all days.

      The revised figure and legend are shown below. It is Supplemental Figure 4 in the revised submission.

      Author response image 5.

      IIS frequency before and after seizures. A. Representative EEG traces recorded from electrodes implanted in the skull over the left frontal cortex, right occipital cortex, left hippocampus (Hippo) and right hippocampus during a spontaneous seizure in a 5 months-old Tg2576 mouse. Arrows point to the start (green arrow) and end of the seizure (red arrow), and postictal depression (blue arrow). B. IIS frequency was quantified from continuous video-EEG for mice that had a spontaneous seizure during the recording period and mice that did not. IIS frequency is plotted for 3 consecutive days, starting with the day before the seizure (designated as day 1), and ending with the day after the seizure (day 3). A two-way RMANOVA was conducted with the day and group (mice with or without a seizure) as main factors. There was a significant effect of day (F(2,4)=46.95, p=0.002) and group (seizure vs no seizure; F(1,2)=46.01, p=0.021) and an interaction of factors (F(2,4)=46.68, p=0.002)..Tukey-Kramer post-hoc tests showed that mice with a seizure had significantly greater IIS frequencies than mice without a seizure for every day (day 1, p=0.0005; day 2, p=0.0001; day 3, p=0.0014). For mice with a seizure, IIS frequency was higher on the day of the seizure than the day before (p=0.037) or after (p=0.010). For mice without a seizure, there were no significant differences in IIS frequency for day 1, 2, or 3. These data are similar to prior work showing that from one day to the next mice without seizures have similar IIS frequencies (Kam et al., 2016).

      In the text, the revised section is in the Results, Section C, starting on Line 772:

      “At 5-6 months, IIS frequencies were not significantly different in the mice fed the different diets (all p>0.05), probably because IIS frequency becomes increasingly variable with age (Kam et al. 2016). One source of variability is seizures, because there was a sharp increase in IIS during the day before and after a seizure (Supplemental Figure 4). Another reason that the diets failed to show differences was that the IIS frequency generally declined at 5-6 months. This can be appreciated in Figure 8B and Supplemental Figure 6B. These data are consistent with prior studies of Tg2576 mice where IIS increased from 1 to 3 months but then waxed and waned afterwards (Kam et al., 2016).”

      (5) The data indicating the protective effect of high choline supplementation are valuable, yet some of the claims are not completely supported by the data, mainly as the analysis of littermate WT mice is not complete.

      We added WT data to show that the high choline diet restored cell loss and ΔFosB expression to WT levels. These data strengthen the argument that the high choline diet was valuable. See the response to Reviewer #1, Public Review Point #2.

      • Line 591: "The results suggest that choline enrichment protected hilar neurons from NeuN loss in Tg2576 mice." A comparison to NeuN expression in WT mice is needed to make this statement.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      • Line 623: "These data suggest that high choline in the diet early in life can reduce hyperexcitability of GCs in offspring later in life. In addition, low choline has an opposite effect, again suggesting this maternal diet has adverse effects." Also here, FosB quantification in WT mice is needed.

      These data have been added. See the response to Reviewer #1, Public Review Point #2.

      (7) Was the effect of choline associated with reduced tauopathy or A levels?

      The mice have no detectable hyperphosphorylated tau. The mice do have intracellular A before 6 months. This is especially the case in hilar neurons, but GCs have little (Criscuolo et al., eNeuro, 2023). However, in neurons that have reduced NeuN, we found previously that antibodies generally do not work well. We think it is because the neurons become pyknotic (Duffy et al., 2015), a condition associated with oxidative stress which causes antigens like NeuN to change conformation due to phosphorylation. Therefore, we did not conduct a comparison of hilar neurons across the different diets.

      (8) Since the mice were tested at 3 months and 6 months, it would be interesting to see the behavioral difference per mouse and the correlation with EEG recording and immunohistological analyses.

      We agree that would be valuable and this has been added to the paper. Please see response to Reviewer #1, Public Review Point #4.

      Reviewer #2 (Recommendations For The Authors):

      There were several areas that could be further improved, particularly in the areas of data analysis (particularly with images and supplemental figures), figure presentation, and mechanistic speculation.

      Major points:

      (1) It is understandable that, for the sake of labor and expense, WT mice were not implanted with EEG electrodes, particularly since previous work showed that WT mice have no IIS (Kam et al. 2016). However, from a standpoint of full factorial experimental design, there are several flaws - purists would argue are fatal flaws. First, the lack of WT groups creates underpowered and imbalanced groups, constraining statistical comparisons and likely reducing the significance of the results. Also, it is an assumption that diet does not influence IIS in WT mice. Secondly, with a within-subject experimental design (as described in Fig. 1A), 6-month-old mice are not naïve if they have previously been tested at 3 months. Such an experimental design may reduce effect size compared to non-naïve mice. These caveats should be included in the Discussion. It is likely that these caveats reduce effect size and that the actual statistical significance, were the experimental design perfect, would be higher overall.

      We agree and have added these points to the Limitations section of the Discussion. Starting on Line 1050: In addition, groups were not exactly matched. Although WT mice do not have IIS, a WT group for each of the Tg2576 groups would have been useful. Instead, we included WT mice for the behavioral tasks and some of the anatomical assays. Related to this point is that several mice died during the long-term EEG monitoring of IIS.

      (2) Since behavior, EEG, NeuN and FosB experiments seem to be done on every Tg2576 animal, it seems that there are missed opportunities to correlate behavior/EEG and histology on a per-mouse basis. For example, rather than speculate in the discussion, why not (for example) directly examine relationships between IIS/24 hours and FosB expression?

      We addressed this point above in responding to Reviewer #1, Public Review Point #4.

      (3) Methods of image quantification should be improved. Background subtraction should be considered in the analysis workflow (see Fig. 5C and Fig. 6C background). It would be helpful to have a Methods figure illustrating intermediate processing steps for both NeuN and FosB expression.

      We added more information to improve the methods of quantification. We did use a background subtraction approach where ImageJ provides a histogram of intensity values, and it determines when there is a sharp rise in staining relative to background. That point is where we set threshold. We think it is a procedure that has the least subjectivity.

      We added these methods to the Methods section and expanded the first figure about image quantification, Figure 6B. That figure and legend are shown above in response to Reviewer #1, Point #2.

      This is the revised section of the Methods, Section C3, starting on Line 345:

      “Photomicrographs were acquired using ImagePro Plus V7.0 (Media Cybernetics) and a digital camera (Model RET 2000R-F-CLR-12, Q-Imaging). NeuN and ∆FosB staining were quantified from micrographs using ImageJ (V1.44, National Institutes of Health). All images were first converted to grayscale and in each section, the hilus was traced, defined by zone 4 of Amaral (1978). A threshold was then calculated to identify the NeuN-stained cell bodies but not background. Then NeuN-stained cell bodies in the hilus were quantified manually. Note that the threshold was defined in ImageJ using the distribution of intensities in the micrograph. A threshold was then set using a slider in the histogram provided by Image J. The slider was pushed from the low level of staining (similar to background) to the location where staining intensity made a sharp rise, reflecting stained cells. Cells with labeling that was above threshold were counted.”

      (4) This reviewer is surprised that the authors do not speculate more about ACh-related mechanisms. For example, choline deficiency would likely reduce Ach release, which could have the same effect on IIS as muscarinic antagonism (Kam et al. 2016), and could potentially explain the paradoxical effects of a low choline diet on reducing IIS. Some additional mechanistic speculation would be helpful in the Discussion.

      We thank the Reviewer for noting this so we could add it to the Discussion. We had not because we were concerned about space limitations.

      The Discussion has a new section starting on Line 1009:

      “Choline and cholinergic neurons

      There are many suggestions for the mechanisms that allow MCS to improve health of the offspring. One hypothesis that we are interested in is that MCS improves outcomes by reducing IIS. Reducing IIS would potentially reduce hyperactivity, which is significant because hyperactivity can increase release of A. IIS would also be likely to disrupt sleep since it represents aberrant synchronous activity over widespread brain regions. The disruption to sleep could impair memory consolidation, since it is a notable function of sleep (Graves et al. 2001; Poe et al. 2010). Sleep disruption also has other negative consequences such as impairing normal clearance of A (Nedergaard and Goldman 2020). In patients, IIS and similar events, IEDs, are correlated with memory impairment (Vossel et al. 2016).

      How would choline supplementation in early life reduce IIS of the offspring? It may do so by making BFCNs more resilient. That is significant because BFCN abnormalities appear to cause IIS. Thus, the cholinergic antagonist atropine reduced IIS in vivo in Tg2576 mice. Selective silencing of BFCNs reduced IIS also. Atropine also reduced elevated synaptic activity of GCs in young Tg2576 mice in vitro. These studies are consistent with the idea that early in AD there is elevated cholinergic activity (DeKosky et al. 2002; Ikonomovic et al. 2003; Kelley et al. 2014; Mufson et al. 2015; Kelley et al. 2016), while later in life there is degeneration. Indeed, the chronic overactivity could cause the degeneration.

      Why would MCS make BFCNs resilient? There are several possibilities that have been explored, based on genes upregulated by MCS. One attractive hypothesis is that neurotrophic support for BFCNs is retained after MCS but in aging and AD it declines (Gautier et al. 2023). The neurotrophins, notably nerve growth factor (NGF) and brain-derived neurotrophic factor (BDNF) support the health of BFCNs (Mufson et al. 2003; Niewiadomska et al. 2011).”

      Minor points:

      (1) The vendor is Dyets Inc., not Dyets.

      Thank you. This correction has been made.

      (2) Anesthesia chamber not specified (make, model, company).

      We have added this information to the Methods, Section D1, starting on Line 375: The animals were anesthetized by isoflurane inhalation (3% isoflurane. 2% oxygen for induction) in a rectangular transparent plexiglas chamber (18 cm long x 10 cm wide x 8 cm high) made in-house.

      (3) It is not clear whether software was used for the detection of behavior. Was position tracking software used or did blind observers individually score metrics?

      We have added the information to the paper. Please see the response to Reviewer #1, Recommendations for Authors, Point #2.

      (4) It is not clear why rat cages and not a true Open Field Maze were used for NOL and NOR.

      We used mouse cages because in our experience that is what is ideal to detect impairments in Tg2576 mice at young ages. We think it is why we have been so successful in identifying NOL impairments in young mice. Before our work, most investigators thought behavior only became impaired later. We would like to add that, in our experience, an Open Field Maze is not the most common cage that is used.

      (5) Figure 1A is not mentioned.

      It had been mentioned in the Introduction. Figure B-D was the first Figure mentioned in the Results so that is why it might have been missed. We now have added it to the first section of the Results, Line 457, so it is easier to find.

      6) Although Fig 7 results are somewhat complicated compared to Fig. 5 and 6 results, EEG comes chronologically earlier than NeuN and FosB expression experiments.

      We have kept the order as is because as the Reviewer said, the EEG is complex. For readability, we have kept the EEG results last.

      (7) Though the statistical analysis involved parametric and nonparametric tests, It is not clear which normality tests were used.

      We have added the name of the normality tests in the Methods, Section E, Line 443: Tests for normality (Shapiro-Wilk) and homogeneity of variance (Bartlett’s test) were used to determine if parametric statistics could be used. We also added after this sentence clarification: When data were not normal, non-parametric data were used. When there was significant heteroscedasticity of variance, data were log transformed. If log transformation did not resolve the heteroscedasticity, non-parametric statistics were used. Because we added correlations and analysis of survival curves, we also added the following (starting on Line 451): For correlations, Pearson’s r was calculated. To compare survival curves, a Log rank (Mantel-Cox) test was performed.

      Figures:

      (1) In Fig. 1A, Anatomy should be placed above the line.

      We changed the figure so that the word “Anatomy” is now aligned, and the arrow that was angled is no longer needed.

      In Fig. 1C and 1D, the objects seem to be moved into the cage, not the mice. This schematic does not accurately reflect the Fig. 1C and 1D figure legend text.

      Thank you for the excellent point. The figure has been revised. We also updated it to show the objects more accurately.

      Please correct the punctuation in the Fig. 1D legend.

      Thank you for mentioning the errors. We corrected the legend.

      For ease of understanding, Fig. 1C and 1D should have training and testing labeled in the figure.

      Thank you for the suggestion. We have revised the figure as suggested.

      Author response image 6.

      (2) In Figure 2, error bars for population stats (bar graphs) are not obvious or missing. Same for Figure 3.

      We added two supplemental figures to show error bars, because adding the error bars to the existing figures made the symbols, colors, connecting lines and error bars hard to distinguish. For novel object location (Fig. 2) the error bars are shown in Supp. Fig. 2. For novel object recognition, the error bars are shown in Supplemental Fig. 3.

      (3) The authors should consider a Methods figure for quantification of NeuN and deltaFOSB (expansions of Fig. 5C and Fig. 6C).

      Please see Reviewer #1, Public Review Point #2.

      (4) In Figure 5, A should be omitted and mentioned in the Methods/figure legend. B should be enlarged. C should be inset, zoomed-in images of the hilus, with an accompanying analysis image showing a clear reduction in NeuN intensity in low choline conditions compared to intermediate and high choline conditions. In D, X axes could delineate conditions (figure legend and color unnecessary). Figure 5C should be moved to a Methods figure.

      We thank the review for the excellent suggestions. We removed A as suggested. We expanded B and included insets. We used different images to show a more obvious reduction of cells for the low choline group. We expanded the Methods schematics. The revised figure is Figure 6 and shown above in response to Reviewer 1, Public Review Point #2.

      (5) In Figure 6, A should be eliminated and mentioned in the Methods/figure legend. B should be greatly expanded with higher and lower thresholds shown on subsequent panels (3x3 design).

      We removed A as suggested. We expanded B as suggested. The higher and lower thresholds are shown in C. The revised figure is Figure 7 and shown above in response to Reviewer 1, Public Review Point #2.

      (6) In Figure 7, A2 should be expanded vertically. A3 should be expanded both vertically and horizontally. B 1 and 2 should be increased, particularly B1 where it is difficult to see symbols. Perhaps colored symbols offset/staggered per group so that the spread per group is clearer.

      We added a panel (A4) to show an expansion of A2 and A3. However, we did not see that a vertical expansion would add information so we opted not to add that. We expanded B1 as suggested but opted not to expand B2 because we did not think it would enhance clarity. The revised figure is below.

      Author response image 7.

      (7) Supplemental Figure 1 could possibly be combined with Figure 1 (use rounded corner rat cage schematic for continuity).

      We opted not to combine figures because it would make one extremely large figure. As a result, the parts of the figure would be small and difficult to see.

      (8) Supplemental Figure 2 - there does not seem to be any statistical analysis associated with A mentioned in the Results text.

      We added the statistical information. It is now Supplemental Figure 4:

      Author response image 8.

      Mortality was high in mice treated with the low choline diet. A. Survival curves are shown for mice fed the low choline diet and mice fed the high choline diet. The mice fed the high choline diet had a significantly less severe survival curve. B. Left: A photo of a mouse after sudden unexplained death. The mouse was found in a posture consistent with death during a convulsive seizure. The area surrounded by the red box is expanded below to show the outstretched hindlimb (red arrow). Right: A photo of a mouse that did not die suddenly. The area surrounded by the box is expanded below to show that the hindlimb is not outstretched.

      The revised text is in the Results, Section E, starting on Line 793:

      “The reason that low choline-treated mice appeared to die in a seizure was that they were found in a specific posture in their cage which occurs when a severe seizure leads to death (Supplemental Figure 5). They were found in a prone posture with extended, rigid limbs (Supplemental Figure 5). Regardless of how the mice died, there was greater mortality in the low choline group compared to mice that had been fed the high choline diet (Log-rank (Mantel-Cox) test, Chi square 5.36, df 1, p=0.021; Supplemental Figure 5A).”

      Also, why isn't intermediate choline also shown?

      We do not have the data from the animals. Records of death were not kept, regrettably.

      Perhaps labeling of male/female could also be done as part of this graph.

      We agree this would be very interesting but do not have all sex information.

      B is not very convincing, though it is understandable once one reads about posture.

      We have clarified the text and figure, as well as the legend. They are above.

      Are there additional animals that were seen to be in a specific posture?

      There are many examples, and we added them to hopefully make it more convincing.

      We also added posture in WT mice when there is a death to show how different it is.

      Is there any relationship between seizures detected via EEG, as shown in Supplemental Figure 3, and death?

      Several mice died during a convulsive seizure, which is the type of seizure that is shown in the Supplemental Figure.

      (9) Supplemental Figure 3 seems to display an isolated case in which EEG-detected seizures correlate with increased IIEs. It is not clear whether there are additional documented cases of seizures that could be assembled into a meaningful population graph. If this data does not exist or is too much work to include in this manuscript, perhaps it can be saved for a future paper.

      We have added other cases and revised the graph. This is now Supplemental Figure 4 and is shown above in response to Reviewer #1, Recommendation for Authors Point #4.

      Frontal is misspelled.

      We checked and our copy is not showing a misspelling. However, we are very grateful to the Reviewer for catching many errors and reading the manuscript carefully.

      (10) Supplemental Figure 4 seems incomplete in that it does not include EEG data from months 4, 5, and 6 (see Fig. 7B).

      We have added data for these ages to the Supplemental Figure (currently Supplemental Figure 6) as part B. In part A, which had been the original figure, only 1.2, 2, and 3 months-old mice were shown because there were insufficient numbers of each sex at other ages. However, by pooling 1.2 and 2 months (Supplemental Figure 6B1), 3 and 4 months (B2) and 5 and 6 months (B3) we could do the analysis of sex. The results are the same – we detected no sex differences.

      Author response image 9.

      IIS frequency was similar for each sex. A. IIS frequency was compared for females and males at 1.2 months (1), 2 months (2), and 3 months (3). Two-way ANOVA was used to analyze the effects of sex and diet. Female and male Tg2576 mice were not significantly different. B. Mice were pooled at 1.2 and 2 months (1), 3 and 4 months (2) and 5 and 6 months (3). Two-way ANOVA analyzed the effects of sex and diet. There were significant effects of diet for (1) and (2) but not (3). There were no effects of sex at any age.

      (1) There were significant effects of diet (F(2,47)=46.21, p<0.0001) but not sex (F(1,47)=0.106, p=0.746). Female and male mice fed the low choline diet or high choline diet were significantly different from female and male mice fed the intermediate diet (all p<0.05, asterisk).

      (2) There were significant effects of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Both female and male mice of the low choline group were significantly different from male mice fed the intermediate diet (both p<0.05, asterisk) but no other pairwise comparisons were significant.

      (3) There were no significant differences (diet, F(2,23)=1.21, p=0.317); sex, F(1,23)=0.844, p=0.368).

      The data are discussed the Results, Section G, tarting on Line 843:

      In Supplemental Figure 6B we grouped mice at 1-2 months, 3-4 months and 5-6 months so that there were sufficient females and males to compare each diet. A two-way ANOVA with diet and sex as factors showed a significant effect of diet (F(2,47)=46.21; p<0.0001) at 1-2 months of age, but not sex (F1,47)=0.11, p=0.758). Post-hoc comparisons showed that the low choline group had fewer IIS than the intermediate group, and the same was true for the high choline-treated mice. Thus, female mice fed the low choline diet differed from the females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Male mice that had received the low choline diet different from females (p<0.0001) and males (p<0.0001) fed the intermediate diet. Female mice fed the high choline diet different from females (p=0.002) and males (p<0.0001) fed the intermediate diet, and males fed the high choline diet difference from females (p<0.0001) and males (p<0.0001) fed the intermediate diet.

      For the 3-4 months-old mice there was also a significant effect of diet (F(2,32)=10.82, p=0.0003) but not sex (F(1,32)=1.05, p=0.313). Post-hoc tests showed that low choline females were different from males fed the intermediate diet (p=0.007), and low choline males were also significantly different from males that had received the intermediate diet (p=0.006). There were no significant effects of diet (F(2,23)=1.21, p=0.317) or sex (F(1,23)=0.84, p=0.368) at 5-6 months of age.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are pleased that Reviewer 3 has deemed our revisions satisfactory; below, we provide responses to the remaining Recommendations for the Authors from Reviewer 2.

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections:

      • Line 91: GWT should be GNWT

      Fixed, thank you.

      • Figure 2: fix the label "Participationcoefficient rank" (no space between Participation and coefficient)

      Fixed, thank you for spotting.

      • Line 317: Figure 2 should be Figure 3

      Fixed, thank you.

      • Line 360: Figure 4D, right?

      Fixed, thank you. We also confirm that Figure 4 and its caption are correct. Under anaesthesia, many regions have more Integrated Information than during Recovery (red regions), but the only changes that are consistently observed across all three contrasts are the decreases.

      • Line 375: Should be Figure 5A

      Fixed, thank you.

      • The recovery period of the anesthesia data is not described in Methods.

      We have now added the missing information:

      “Propofol was discontinued following the deep anaesthesia scan, and participants reached level 2 of the Ramsey scale approximately 11 minutes afterwards, as indicated by clear and rapid responses to verbal commands. This corresponds to the “recovery” period 176.”

      We have also expanded our discussion on the interaction between information decomposition and measures of directionality:

      “Indeed, transfer entropy can itself be decomposed into information-dynamic atoms through Partial Information Decomposition and Integrated Information Decomposition 33,34,49,151; ΦID can further decompose the Normalised Directed Transfer Entropy measure used by Deco et al 5, as recently demonstrated 152. We look forward to a more refined conceptualization of the synergistic workspace architecture that takes into account both information types and the directionality of information flow – especially in datasets with higher temporal resolution.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, Yu et al. describe the chemotactic gradient formation for CCL5 bound to - i.e. released from - glycosaminoglycans. The authors provide evidence for phase separation as the driving mechanism behind chemotactic gradient formation. A conclusion towards a general principle behind the finding cannot be drawn since the work focuses on one chemokine only, which is particularly prone to glycan-induced oligomerisation.

      Strengths:

      The principle of phase separation as a driving force behind and thus as an analytical tool for investigating protein interactions with strongly charged biomolecules was originally introduced for protein-nucleic acid interactions. Yu et al. have applied this in their work for the first time for chemokine-heparan sulfate interactions. This opens a novel way to investigate chemokine-glycosaminoglycan interactions in general.

      Response: Thanks for the encouragement of the reviewer.

      Weaknesses:

      As mentioned above, one of the weaknesses of the current work is the exemplification of the phase separation principle by applying it only to CCL5-heparan sulfate interactions. CCL5 is known to form higher oligomers/aggregates in the presence of glycosaminoglycans, much more than other chemokines. It would therefore have been very interesting to see, if similar results in vitro, in situ, and in vivo could have been obtained by other chemokines of the same class (e.g. CCL2) or another class (like CXCL8).

      Response: We share the reviewer’s opinion that to investigate more molecules/cytokines that interact with heparan sulfate in the system should be of interesting. We expect that researchers in the field will adapt the concept to continue the studies on additional molecules. Nevertheless, our earlier study has demonstrated that bFGF was enriched to its receptor and triggered signaling transduction through phase separation with heparan sulfate (PMID: 35236856; doi: 10.1038/s41467-022-28765-z), which supports the concept that phase separation with heparan sulfate on the cell surface may be a common mechanism for heparan sulfate binding proteins. The comment of the reviewer that phase separation is related to oligomerization is demonstrated in (Figure 1—figure supplement 2C and D), showing that the more easily aggregated mutant, A22K-CCL5, does not undergo phase separation.

      In addition, the authors have used variously labelled CCL5 (like with the organic dye Cy3 or with EGFP) for various reasons (detection and immobilisation). In the view of this reviewer, it would have been necessary to show that all the labelled chemokines yield identical/similar molecular characteristics as the unlabelled wildtype chemokine (such as heparan sulfate binding and chemotaxis). It is well known that labelling proteins either by chemical tags or by fusion to GFPs can lead to manifestly different molecular and functional characteristics.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Figure R1), which replaced original Figure 2—figure supplement 1.

      Author response image 1.

      Chemotactic function of CCL5-EGFP and CCL5-Cy3. Cy3-Labeled CCL5 has similar activity as CCL5, 50 nM CCL5 or CCL5-Cy3 were added to the lower chamber of the Transwell. THP-1 cells were added to upper chambers. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests. NS, Not Significant.

      Reviewer #2 (Public Review):

      Although the study by Xiaolin Yu et al is largely limited to in vitro data, the results of this study convincingly improve our current understanding of leukocyte migration.

      (1) The conclusions of the paper are mostly supported by the data although some clarification is warranted concerning the exact CCL5 forms (without or with a fluorescent label or His-tag) and amounts/concentrations that were used in the individual experiments. This is important since it is known that modification of CCL5 at the N-terminus affects the interactions of CCL5 with the GPCRs CCR1, CCR3, and CCR5 and random labeling using monosuccinimidyl esters (as done by the authors with Cy-3) is targeting lysines. Since lysines are important for the GAG-binding properties of CCL5, knowledge of the number and location of the Cy-3 labels on CCL5 is important information for the interpretation of the experimental results with the fluorescently labeled CCL5. Was the His-tag attached to the N- or C-terminus of CCL5? Indicate this for each individual experiment and consider/discuss also potential effects of the modifications on CCL5 in the results and discussion sections.

      Response: We agree with the reviewer that labeling may lead to altered property of a protein, thus, we have compared chemotactic activity of CCL5 and CCL5-EGFP (Figure 2—figure supplement 1). To further verify this, we performed additional experiment to compare chemotactic activity between CCL5 and Cy3-CCL5 (see Author response image 1). For the convenience of readers, we have combined the original Figure 2—figure supplement 1 with the new data (Author response image 1), which replaced original Figure 2—figure supplement 1.

      The His-tag is attached to the C-terminus of CCL5, in consideration of the potential impact on the N-terminus.

      (2) In general, the authors appear to use high concentrations of CCL5 in their experiments. The reason for this is not clear. Is it because of the effects of the labels on the activity of the protein? In most biological tests (e.g. chemotaxis assays), unmodified CCL5 is active already at low nM concentrations.

      Response: We agree with the reviewer that the CCL5 concentrations used in our experiments were higher than reported chemotaxis assays and also higher than physiological levels in normal human plasma. In fact, we have performed experiments with lower concentration of CCL5, where the effect of LLPS was not seen though the chemotactic activity of the cytokine was detected. Thus, LLPS-associated chemotactic activity may represent a scenario of acute inflammatory condition when the inflammatory cytokines can increase significantly.

      (3) For the statistical analyses of the results, the authors use t-tests. Was it confirmed that data follow a normal distribution prior to using the t-test? If not a non-parametric test should be used and it may affect the conclusions of some experiments.

      Response: We thank the reviewer for pointing out this issue. As shown in Author response table 1, The Shapiro-Wilk normality test showed that only two control groups (CCL5 and 44AANA47-CCL5+CHO K1) in Figure 3 did not conform to the normal distribution. The error was caused by using microculture to count and calculate when there were very few cells in the microculture. For these two groups, we re-counted 100 μL culture medium to calculate the number of cells. The results were consistent with the positive distribution and significantly different from the experimental group (Author response image 3). The original data for the number of cells chemoattractant by 500 nM CCL5 was revised from 0, 247, 247 to 247, 123, 370 and 500 nM 44AANA47 +CHO-K1 was revised from 1111, 1111, 98 to 740, 494, 617. The revised data does not affect the conclusion.

      Author response table 1.

      Table R1 Shapiro-Wilk test results of statistical data in the manuscript

      Author response image 3.

      Quantification of THP-1collected from the lower chamber. Data are mean ± s.d. n=3. P values were determined by unpaired two-tailed t-tests.

      Recommendations for the authors:

      Reviewer #1:

      See the weaknesses section of the Public Review. In addition, the authors should discuss the X-ray structure of CCL5 in complex with a heparin disaccharide in comparison with their docked structure of CCL5 and a heparin tetrasaccharide.

      Response: Our study, in fact, is strongly influenced by the report (Shaw, Johnson et al., 2004) that heparin disaccharide interaction with CCL5, which is highlighted in the text (page5, line100-102).

      Reviewer #2:

      (1) Clearly indicate in the results section and figure legends (also for the supplementary figures) which form and concentration of CCL5 is used.

      Response: The relevant missing information is indicated across the manuscript.

      (2) Clearly indicate which GAG was used. Was it heparin or heparan sulfate and what was the length (e.g. average molecular mass if known) or source (company?)?

      Response: Relevant information is added in the section “Materials and Methods.

      (3) Line 181: What do you mean exactly with "tiny amounts"?

      Response: “tiny amounts” means 400 transfected cells. This is described in the section of Materials and Methods. It is now also indicated in the text and legend to the figure.

      (4) Lines 216-217: This is a very general statement without a link to the presented data. No combination of chemokines is used, in vivo testing is limited (and I agree very difficult). You may consider deleting this sentence (certainly as an opening sentence for the Discussion).

      Response: We appreciate very much for the thoughtful suggestion of the reviewer. This sentence is deleted in the revised manuscript.

      (5) Why was 5h used for the in vitro chemotaxis assay? This is extremely long for an assay with THP-1 cells.

      Response: We apologize for the unclear description. The 5 hr includes 1 hr pre- incubation of CCL5 with the cells enable to form phase separation. After transferring the cells into the upper chamber, the actual chemotactic assay was 4 hr. This is clarified in the Materials and Methods section and the legend to each figure.

      (6) Define "Sec" in Sec-CCL5-EGFP and "Dil" in the legend of Figure 4.

      Response: The Sec-CCL5-EGFP should be “CCL5-EGFP’’, which has now been corrected. Dil is a cell membrane red fluorescent probe, which is now defined.

      (7) Why are different cell concentrations used in the experiment described in Figure 5?

      Response: The samples were from three volunteers who exhibited substantially different concentrations of cells in the blood. The experiment was designed using same amount of blood, so we did not normalize the number of the cell used for the experiment. Regardless of the difference in cell numbers, all three samples showed the same trend.

      (8) Check the text for some typos: examples are on line 83 "ratio of CCL5"; line 142 "established cell lines"; line 196 "peripheral blood mononuclear cells"; line 224 "to mediate"; line 226 "bind"; line 247 "to form a gradient"; line 248 "of the glycocalyx"; line 343 and 346 "tetrasaccharide"; line 409-410 "wild-type"; line 543 "on the surface of CHO-K1 and CHO-677"; line 568 "white".

      Response: Thanks for the careful reading. The typo errors are corrected and Manuscript was carefully read by colleagues.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Pg. 3 - lines 51-53: "Once established, the canonical RdDM pathway takes over, whereby small RNAs are generated by the plant-specific polymerase IV (Pol IV). In both cases, a second plant-specific polymerase, Pol V, is an essential downstream component." The authors' intro omits an important aspect of Pol V's function in RdDM, which is quite relevant to their study. Pol V transcribes DNA to synthesize noncoding RNA scaffolds, to which AGO4-bound 24 nt siRNAs are thought to base pair, leading to DRM2 recruitment for cytosine methylation near to these nascent Pol V transcripts (Wierzbicki et al 2008 Cell; Wierzbicki et al. 2009 Nat Genet). I recommend that the authors cite these key studies.

      These citations have now been added (see line 57).

      The authors provide compelling evidence that Pol V redistributes to ectopic heterochromatin regions in h1 mutants (e.g., Fig1a browser shot). Presumably, this would allow Pol V to transcribe these regions in h1 mutants, whereas it could not transcribe them in WT plants. Have the authors detected and/or quantified Pol V transcripts in the h1 mutant compared to WT plants at the sites of Pol V redistribution (detected via NRPE1 ChIP)?

      Robust detection of Pol V transcripts can be experimentally challenging, and instead we quantify and detect NRPE1 dependent methylation at these regions (Fig 5), which occurs downstream of Pol V transcript production. However, we note detecting Pol V transcripts as a potential future direction in the discussion (see line 263).

      Pg. 5 - lines 101-102: Figure 1e - "The preferential enrichment of NRPE1 in h1 was more pronounced at TEs that overlapped with heterochromatin associated mark, H3K9me2 (Fig. 1e). Was a statistical test performed to determine that the overall differences are significant only at TE sites with H3K9me2? Can the sites without H3K9me2 also be differentiated statistically?

      Yes, there is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs (Wilcoxon rank sum tests, see updated Fig 1e). The size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). This reflects the general increase in NRPE1 occupancy in h1 mutants through the genome, with the effect consistently stronger in heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). Although important exceptions exist (see Fig 5, NRPE1 and DNA methylation loss in h1), we now make this point even more explicit, and have updated the manuscript at several locations (abstract line 26, results line 245, discussion line 265).

      Pg. 5 - lines 108-110: The authors state, "Importantly, we found no evidence for increased NRPE1 expression at the mRNA or protein level in the h1 mutant (Suppl. Fig. 2)." But the authors did observe reduced NRPE1 transcript levels in h1 mutants, in their re-analysis of RNA-seq data and reduced NRPE1 protein signals via western blot in (Suppl. Fig. 2), which should be reported here in the results.

      As described further below, we reanalysed h1 RNA-seq from scratch, and see no evidence for significant differential gene expression of NRPE1. This table and analysis are now provided in Supplementary Table 1.

      More importantly, the above logic about NRPE1 expression in h1 mutants assumes that NRPE1 is the stoichiometrically limiting subunit for Pol V assembly and function in vivo, but this is not known to be the case:

      (1) While NRPE1's expression is somewhat reduced (and not increased) in h1 mutant plants, we cannot be certain that other genes influencing Pol V stability or recruitment are unaffected by h1 mutants. I thus recommend that the authors perform RT-qPCR directly on the WT and h1 mutant materials used in their current study, quantifying NRPE1, NRPE2, NRPE5, DRD1, DMS3, RDM1, SUVH2 and SUVH9 transcript levels.

      (2) Normalizations used to compare samples should be included with RT-qPCR and western assays. An appropriate house-keeping gene like Actin2 or Ubiquitin could be used to normalize the RT-qPCR. Protein sample loading in Suppl. Fig. 2 could be checked by Coomassie staining and/or an antibody detection of a house-keeping protein.

      We have now included a full re-analysis of h1 RNA-seq (data from Choi et al 2020) focusing on transcriptional changes of DNA methylation machinery genes in the h1 mutant. Of the 61 genes analysed, only AGO6 and AGO9 were found to be differentially expressed (2-3 fold upregulation). This analysis is now included as a table

      (Supplementary Table 1). The western blot has been moved to Supplementary Fig 3 to now illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a).

      Pg. 6 - lines 129-131: The authors state that "over NRPE1 defined peaks (where NRPE1 occupancy is strongest in WT) we observed no change in H1 occupancy in nrpe1 (Fig 2b). The results indicate that H1 does not invade RdDM regions in the nrpe1 mutant background." This conclusion assumes that the author's H1 ChIP is successfully detecting H1 occupancy. However, in Fig 2d there does not appear to be H1 enrichment or peaks as visualized across the 10766 ZF-DMS3 off-target loci, or even at the selected 451 ZFDMS3 off-target hyper DMRs, where the putative signal for H1 enrichment on the metaplot center is extremely weak/non-existent.

      As a reference for H1 enrichment in chromatin (e.g., looking where H2A.W antagonizes H1 occupancy) one can compare analyses in Bourguet et al (2021) Nat Commun, involving co-authors of the current study. Bourguet et al (2021) Fig 5b show a metaplot of H1 levels centered on H2A.W peaks with H1 ChIP signal clearly tapering away from the metaplot center point peak. To my eye, the H1 ChIP metaplots for ZF-DMS3 offtarget loci in the current manuscript (Fig 2d) resemble "shuffled peaks" controls like those in Fig 5b of Bourguet et al (2021).

      Can one definitively interpret Fig 2d as showing RdDM "not reciprocally affecting H1 localization" without first showing the specificity of the ChIP-seq results in a genotype where H1 occupancy changes? Alternatively, could this dataset be displayed with Deeptools heatmaps to strengthen the evidence that the authors are detecting H1 occupancy/enrichment genome-wide, before diving into WT/nrpe1 mutant analysis at ZF-DMS3 off-target loci?

      This is an excellent suggestion from the reviewer. We have now included several analyses that assess and demonstrate the quality of our H1 ChIP-seq profiles. First, as suggested by the reviewer, we show that our H1 profiles peak over H2A.W enriched euchromatic TEs as defined by Bourguet et al, mirroring these published findings. Next, we investigated whether our H1 profiles match Teano’s recently described pattern over genes, confirming a similar pattern with 3’ enrichment of H1 over H3K27me3 unmarked genes. Furthermore, we show that the H1 peaks defined here are similarly enriched with GFP tagged H1.2 from the Teano et al. 2023 study. These analyses that validate the quality of our H1 ChIP-seq datasets and bolster the conclusion that NRPE1 redistribution does not affect H1 occupancy. These new analysis are now presented in Supplementary Figure 3 and see line 153.

      Pg. 8 - lines 228-230: The authors state that, "As with NRPE1, SUVH1 increased in the h1 background significantly more in heterochromatin, with preferential enrichment over long TEs, cmt2 dependent hypo CHH DMRs, and heterochromatic TEs (Fig. 6b)."

      Contrary to the above statement, the violin plots in Fig. 6c show SUVH1 occupancy increasing at euchromatic TEs in the h1 mutant. What statistical test allowed the authors to determine that the increase in h1 occurs "significantly more in heterochromatin"? The authors should critically interpret Fig. 6c and 6d, which are not currently referenced in the results section. More support is needed for the claim that SUVH1 specifically encroaches into heterochromatin in the h1 mutant, rather than just TEs generally (euchromatic and heterochromatic alike).

      Similar to what we see for NRPE1, statistical tests that we have now performed show that SUVH1 is significantly enriched in h1 in all classes. Importantly however, the effect size is larger in all of the heterochromatin associated classes. We display these statistical tests and the median values on the plots so that effects are immediately viewable (see updated Fig 6).

      In addition, the authors should verify that SUVH1-3xFLAG transgenes (in the WT and h1 mutant backgrounds, respectively) and endogenous Arabidopsis genes encoding the transcriptional activator complex (SUVH1-SUVH3-DNAJ1-DNAJ2) are not overexpressed in the h1 mutant vs. WT. Higher expression of SUVH1 or limiting factors in the larger complex could explain the observation of increased SUVH1 occupancy in the h1 background.

      We do not see a difference in SUVH1/3/DNAJ1/2 complex gene expression in the h1 background (see Supplementary Table 1). However, we cannot rule out that that our SUVH1-FLAG line in h1 is more highly expressed than the corresponding SUVH1-FLAG line in WT. We now note this point in line 248.

      Pg. 8 - lines 231-232: Here the authors make a sweeping conclusion about H1 demarcating, "the boundary between euchromatic and heterochromatic methylation pathways, likely through promoting nucleosome compaction and restricting heterochromatin access." I do not see how a H1 boundary between euchromatic and heterochromatic methylation pathways is revealed based on the SUVH1-3xFLAG occupancy data, which shows increased enrichment at every category interrogated in the h1 mutant (Fig 6b,c,d) and all along the baseline too in the h1 mutant browser tracks (Fig 6a). Can the authors provide more examples of this phenomenon (similar to Fig 6a) and better explain why their SUVH1-3xFLAG ChIP supports this demarcation model?

      The general conclusion from SUVH1 about H1’s agnostic role in preventing heterochromatin access is now further supported from our findings with H3K27me3 (see Figure 6e and description from line 250). However, we agree that the demarcation model as initially presented was overly simplistic. This point was also raised by reviewer 2. We have removed the line highlighted by the reviewer in the revised version of the manuscript. In the revised version we clarify that H1 impedes RdDM and associated machinery throughout the genome (consistent with H1’s established broad occupancy across the genome) but this effect is most pronounced in heterochromatin, corresponding to maximal H1 occupancy (abstract line 26, results line 245, discussion line 265). 

      Corrections:

      Pg. 8 - lines 226-227: "We therefore wondered whether complex's occupancy might also be affected by H1." The sentence contains a typo, where I assume the authors mean to refer to occupancy by the SUVH1-SUVH3-DNAJ1-DNAJ2 transcriptional activator complex. This needs to be specified more clearly.

      The paragraph has been updated (see from line 237).

      Pg. 13 - lines 393-405: There are minor errors in the capitalization of titles and author initials in the References. I recommend that the authors proofread all the references to eliminate these issues:

      Thank you, these have been corrected.

      Choi J, Lyons DB, Zilberman D. 2021. Histone H1 prevents non-cg methylation-mediated small RNA biogenesis in arabidopsis heterochromatin. Elife 10:1-24. doi:10.7554/eLife.72676 (...)

      Du J, Johnson LM, Groth M, Feng S, Hale CJ, Li S, Vashisht A a., Gallego-Bartolome J, Wohlschlegel J a., Patel DJ, Jacobsen SE. 2014. Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Mol Cell 55:495-504. doi:10.1016/j.molcel.2014.06.009 (...)

      Du J, Zhong X, Bernatavichute Y V, Stroud H, Feng S, Caro E, Vashisht A a, Terragni J, Chin HG, Tu A, Hetzel J, Wohlschlegel J a, Pradhan S, Patel DJ, Jacobsen SE. 2012. Dual binding of chromomethylase domains to H3K9me2-containing nucleosomes directs DNA methylation in plants. Cell 151:167-80. doi:10.1016/j.cell.2012.07.034

      Reviewer #2 (Recommendations For The Authors):

      As for a normal review, here are our major and minor points.

      Major:

      (1) Lines 38 to 45 of the introduction are important for the subsequent definition of heterochromatic and non-heterochromatic transposons, but the definition is ambiguous. Is heterochromatin defined by surrounding context such as pericentromeric position or is this an autonomous definition? Can a TE with the chromosomal arms be considered heterochromatic provided that it is long enough and recruits the right machinery? These cases should be more explicitly introduced. Ideally, a supplemental dataset should provide a key to the categories, genomic locations and overlapping TEs as they were used in this analysis, even if some of the categories were taken from another study.

      We have now added all the regions used for analysis in this study to Supplementary Table 3.

      (2) Line 80: This would be the first chance to cite Teno et al. and the "encroachment" of

      PcG complexes to TEs in H1 mutants

      Done - “H1 also plays a key role in shaping nuclear architecture and preventing ectopic polycomb-mediated H3K27me3 deposition in telomeres (Teano et al., 2023).” See line 83

      (3) It is "only" a supplemental figure but S2 but it should still follow the rules: Indicate the number of biological replicates for the RNA-seq data, and perform a statistical test. In case of WB data, provide a loading control.

      We are now using the western blot to illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a). For NRPE1 mRNA expression, we have now replaced this with a more comprehensive transcriptome analysis of methylation machinery in h1 (see Supplementary Table 1). 

      (4) Lines 115 to 124 and corresponding data: Here, the goal is to exclude other changes to heterochromatin structure other than "increased access" in H1 mutants; however, only one feature, H3K9me2, is tested. Testing this one mark does not necessarily prove that the nature of the chromatin does not change, e.g. H2A.W could be differently redistributed, DDM1 may change, VIM protein, and others. Either more comprehensive testing for heterochromatin markers should be performed, or the conclusions moderated.

      We have moderated the text accordingly (see line 135).

      (5) Lines 166ff and Figure 1, a bit out of order also Figure 5: The general hypothesis is that NRPE1 redistributes to heterochromatic regions in h1 mutants (as do other chromatin modifiers), but the data seem to only support a higher occurrence at target sites.

      a. The way the NRPE1 data is displayed makes it seem like there is much more NRPE1 in the h1 samples, even at peaks that should not be recruiting more as they do not represent "long" TEs. It would be good to present more gbrowse shots of all peak classes.

      We now clarify that h1 does result in a general increase of NRPE1 throughout the genome, but the effect is strongest at heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). We have modified the language at several locations throughout the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265). We include several browser shots in Supp Fig. 8.

      b. The data are "normalized" how exactly?

      c. One argument of observing "gaining" and "losing" peaks is that there is redistribution of NRPE1 from euchromatic to heterochromatic sites. There should be an analysis and figure to corroborate the point (e.g. by comparing FRIP values). Figure 1b shows lower NRPE1 signals at the TE flanking regions. This could reflect a redistribution or a flawed normalization procedure.

      The data are normalised using a standardised pipeline by log2 fold change over input, after scaling each sample by mapped read depth using the bamCompare function in deepTools. This is now described in detail in the Materials and Methods line 365, with full code and pipelines available from GitHub (https://github.com/Zhenhuiz/H1-restrictseuchromatin-associated-methylation-pathways-from-heterochromatic-encroachment).

      d. Figure 1d and f show similar profiles comparing "long" and "short" TEs or "CMT2 dependent hypo-CHH" and "DRM2 dependent CHH". How do these categories relate to each other, how many fragments are redundant?

      The short vs long TEs were defined in Liu et al 2018 (doi: 10.1038/s41477-017-0100-y) and the DMRs were defined in Zhang et al. 2018 (DOI: 10.1073/pnas.1716300115). There is likely to be some degree of overlap between the categories, but numbers are very different (short TEs (n=820), long TEs (n=155), drm2 DMRs (n=5534), CMT (n=21784)) indicating that the different categories are informative. We have now listed all the regions used for analysis in this study as in Supplementary Table 3.

      e. The purpose of the data presented in Figure 1 b is to compare changes of NRPE1 association in H3K9me3 non-overlapping and overlapping TEs between wild-type and background, yet the figure splits the categories in two subpanels and does neither provide a fold-change number nor a statistical test of the comparison. As before, the figure does not really support the idea that NPRE1 somehow redistribute from its "normal" sites towards heterochromatin as both TE classes seem to show higher NRPE1 binding in h1 mutants.

      There is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs, however, the size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). Although important exceptions exist (see Fig 5 – regions that lose NRPE1 and DNA methylation), this reflects the general increase in NRPE1 occupancy in h1 mutants throughput the genome, with a consistently stronger effect in heterochromatin. As noted above, we have updated the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265).

      f. Panel g is the only attempt to corroborate the redistribution towards heterochromatic regions, but at this scale, the apparent reduction of binding in the chromosome arms may be driven by off-peak differences and normalization problems between different ChIP samples with different signal-to-noise-ratio.

      We describe our normalisation and informatic pipeline in more detail in the Materials and Methods line 365. It is also important to note that the reduction is not only observed at the chromosomal level, but also at specific sites. We called differential peaks between WT and h1 mutant. The "Regions that gain NRPE1 in h1" peaks are more enriched in heterochromatic regions, while " Regions that lose NRPE1 in h1" peaks are more enriched outside heterochromatic regions.

      g. Figure 5: how many regions gain vs lose NRPE1 in h1 mutants? If the "redistribution causes loss" scenario applies, the numbers should overall be balanced but that does not seem the case. The loss case appears to be rather exceptional judging from the zigzagging meta-plot. Are these sites related to the sites taken over by PcG-mediated repression in h1 mutants?

      As described in line 222 (previous version of the manuscript line 206), there are 15,075 sites that gain and 1,859 sites that lose NRPE1 in h1. Comparing these sites to

      H3K27me3 in the Teano et al. study was an excellent suggestion. We compared sites that gain NRPE1 to sites that gain H3K27me3 in h1, finding a statistically significant overlap (2.4 fold enrichment over expected, hypergeometric test p-value 2.1e-71). Reciprocally, sites that lose NRPE1 were significantly enriched for overlap with H3K27me3 loss regions (1.6 fold over expected, hypergeometric test p-value 1.4e-4). This indicates that RdDM and H3K27me3 patterning are similarly modulated by H1. To directly test this, we reanalysed the H3K27me3 ChIP-seq data from Teano et al., finding coincident gain and loss of H3K27me3 at sites that gain and lose NRPE1 in h1. These results are described from line 250 and in Fig 6e, which supports a general role for H1 in preventing heterochromatin encroachment.

      (6) Lines 166ff and Figure 3: The data walk towards the scenario of pathway redistribution but actually find that RdDM plays a minor role overall as a substantial increase in heterochromatin regions occurs in all contexts and is largely independent of RdDM.

      a. How exactly are DNA-methylation data converted across regions to reach a fraction score from 0 to 1? There is no explanation in the legend for the methods that allow to recapitulate.

      We now explain our methods in full in the Materials and Methods and all the code for generating these has now been deposited on GitHub (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment). Briefly, BSMAP is used to calculate the number of reads that are methylated vs unmethylated on a per-cytosine basis across the genome. Next, the DNA methylation fraction in each region is calculated by adding all the methylation fractions per cytosine in a given window, and divided by the total number of cytosines in that same window (ie mC/(unmC+mC)) i.e. this is expressed as a fraction ranging from 0 to 1.

      “0” indicates this region is not methylated, and “1” indicates this region is fully methylated (every cytosine is 100% methylated).  

      b. Kernel plots? These are slang for experts and should be better described. In addition, nothing is really concluded from these plots in the text, although they may be quite informative.

      Kernel density plots show the proportion of TEs that gain or lose methylation in a particular mutant, rather than the overall average as depicted in the methylation metaplots above. We now describe the kernel density plots in more detail in the Figure 3 legend. 

      (7) Figure 4: This could be a very interesting analysis if the reader could actually understand it.

      a. The legend is minimal. What is the meaning of hypo and hyper regions indicated to the right of Figure 4c?

      b. The color scale represents observed/expected values. What exactly does this mean? Mutant vs WT?

      c. Some comparisons in 4a are cryptic, e.g. h1 nrpe1 nrpe1 vs CHH?

      d. Figure 4d focuses on a correlation square of relevance, but why? Interestingly the square does not correspond to any "hypo" or "hyper" label?

      Thank you, we have revised Figure 4 and legend based on these suggestions to clarify all of the above.

      (8) Lines 226 and Figure 6B. De novo (or increased) targeting of SUVH1 to heterochromatic sites in h1 mutants, similar to NRPE1, is used to support the argument that more access allows other chromatin modifiers to encroach. SUVH1 strongly depends on RdDM for its in vivo binding and may be the least conclusive factor to argue for a "general" encroachment mechanism.

      We appreciate the reviewers point here. Something that is entirely independent of RdDM following the same pattern would be stronger evidence in favour of general encroachment. Excitingly, this is exactly what we provide evidence for when investigating the interrelationship with H3K27me3 and we appreciate the reviewer’s suggestion to check this! This data is now described in Figure 6e and line 250.

      Minor:

      (1) Line 23: "Loss of H1 resulted in heterochromatic TE enrichment by NRPE1." This does not seem right. NRPE enrichment as TEs

      Modified, (line 26) thank you.

      (2) Lines 73-74: The idea that DDM1 displaces H1 in heterochromatic TEs is somewhat counterintuitive to model that heterochromatic TEs are unavailable for RdDM because of the presence of H1. Is this displacement non-permanent and directly linked to interaction with CMT2/3 Met1?

      This is a very good question and we agree with the reviewer that the effect of DDM1 may only be transient or insufficient to allow for full RdDM assembly, or indeed there may be a direct interaction between DDM1 and CMTs/MET1. During preparation of these revisions, a structure of Arabidopsis nucleosome bound DDM1 was published, which provides some insight by showing that DDM1 promotes DNA sliding. This is at least consistent with the idea of DDM1 causing transient / non-permanent displacement of H1 that would be insufficient for RdDM establishment. We incorporate discussion of these ideas at line 80.

      (3) Line 85: A bit more background on the Reader activator complex should be given. In fact, the reader may not really care that it was more recently discovered (not really recent btw) but what does it actually do?

      We have quite extensively reconfigured this paragraph to take into account our new finding with H3K27me3, such that there is less emphasis on the reader activator complex. The sentence now reads as follows:

      “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions. This effect was not limited to RdDM,  similarly impacting both the methylation reader complex component, SUVH1 (Harris et al., 2018) and polycomb-mediated H3K27me3 (Teano et al., 2023).” (line 95). 

      Also, when describing the experiment the results section (line 241), we now provide more background on SUVH1’s function.

      (4) Lines 80-81: Since it is already shown that RdDM associated small RNAs are more enriched in h1 at heterochromatin, help us to know what is precisely the added value of studying the enrichment of NRPE1 at these sites.

      Good point. We have the following line: ‘...small RNAs are not a direct readout of functional RdDM activity and Pol IV dependent small RNAs are abundant in regions of the genome that do not require RdDM for methylation maintenance and that do not contain Pol V (Stroud et al., 2014).’ (line 90)

      (5) Line 99: This seems to be the only time where the connection between long TEs and heterochromatic regions is mentioned but no source is cited.

      We have added the following appropriate citations: (Bourguet et al., 2021; Zemach et al., 2013). (line 110).

      (6) Line 100: DMRs is used for the first time here without explanation and full text. The abbreviation is introduced later in the text (Line 187).

      Thank you, we now describe DMRs upon first use, line 112.

      (7) Figure 2: Panels 2 c and d should show metaplots for WT and transgenes in one panel. There is something seriously wrong with the normalization in d or the scale for left and right panel is not the same. Neither legend nor methods describe how normalization was performed.

      Thank you for pointing this out, the figure has been corrected. We have updated the Materials and Methods (line 365) and have added codes and pipelines to GitHub to explain the normalisation procedure in more detail (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their constructive comments. Here is a summary of the main changes we made from the previous manuscript version, based on the reviewers’ comments:

      (1) Introduction of a new model, based on a Markov chain, capturing within-trial evolution in search strategy .

      (2) Addition of a new figure investigating inter-animal variations in search strategy.

      (3) Measurement of model fit consistency across 10 simulation repetitions, to prevent the risk of model overfitting.

      (4) Several clarifications have been made in the main text (Results, Discussion, Methods) and figure legends.

      (5) We now provide processed data and codes for analyses and models at GitHub repository

      (6) Simplification of the previous modeling. We realized that the two first models in the previous manuscript version were simply special cases of the third model. Therefore, we retained only the third model, which has been renamed as the ‘mixture model’.

      (7) Modification of Figure 4-6 and Supplementary Figure 7-8 (or their creation) to reflect the aforementioned changes

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that one of these models, a combined strategy model, best explains the experimental data.

      This study is written concisely and the results presented concisely. The best fit model is reasonably simple and fits the experimental data well (at least the summary measures of the data that were presented).

      Major points:

      (1) One combined strategy (once the goal location is learned) that might seem to be reasonable would be that the animal knows roughly where the goal is, but not exactly where, so it first uses a spatial strategy just to get to the first vestibule, then switches to a serial strategy until it reaches the correct vestibule. How well would such a strategy explain the data for the later sessions? The best combined model presented in the manuscript is one in which the animal starts with a roughly 50-50 chance of a serial (or spatial strategy) from the start vestibule (i.e. by the last session before the reversal the serial and spatial strategies are at ~50-50m in Fig. 5d). Is it the case that even after 15 days of training the animal starts with a serial strategy from its starting point approximately half of the time? The broader point is whether additional examination of the choices made by the animal, combined with consideration of a larger range of possible models, would be able to provide additional insight into the learning and strategies the animal uses.

      Our analysis focused on the evolution of navigation strategies across days and trials. The reviewer raises the interesting possibility that navigation strategy might evolve in a specific manner within each trial, especially on the later days once the environment is learned. To address this possibility, we first examined how some of the statistical distributions, previously analyzed across days, evolved within trials. Consistent with the reviewer’s intuition, the statistical distributions changed within trials, suggesting a specific strategy evolution within trials. Second, we developed a new model, where strategies are represented as nodes of a Markov chain. This model allows potential strategy changes after each vestibule visit, according to a specific set of transition probabilities. Vestibules are chosen based on the same stochastic processes as in the previous model. This new model could be fitted to the experimental distributions and captured both the within-trial evolution and the global distributions. Interestingly, the trials were mostly initiated in the random strategy (~67% chance) and to a lesser extent in the spatial strategy (~25% chance), but rarely in the serial strategy (~8% chance). This new model is presented in Figure 6.

      (2) To clarify, in the Fig. 4 simulations, is the "last" vestibule visit of each trial, which is by definition 0, not counted in the plots of Fig. 4b? Otherwise, I would expect that vestibule 0 is overrepresented because a trial always ends with Vi = 0.

      The last vestibule visit (vestibule 0 by definition) is counted in the plots of Fig.4b. We initially shared the same concern as the reviewer. However, upon further consideration, we arrived at the following explanation: A factor that might lead to an overrepresentation of vestibule 0 is the fact that, unlike other vestibules, it has to be contained in each trial, as trials terminated upon the selection of vestibule 0. Conversely, a factor that might contribute to an underrepresentation of vestibule 0 is that, unlike other vestibules, it cannot be counted more than once per trial. Somehow these two factors seem to counterbalance each other, resulting in no discernible overrepresentation or underrepresentation of vestibule 0 in the random process. 

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. Overall I find the work to be solid, with the cleverly designed maze/protocol to be its major strength - however there are some issues that I believe should be addressed and clarified.

      (1) Whilst I'm generally a fan of the experimental protocol, the design means that internal odor cues on the maze change from trial to trial, along with cues external to the maze such as the sounds and visual features of the recording room, ultimately making it hard for the mice to use a completely allocentric spatial 'place' strategy to navigate. I do not think there is a way to control for these conflicts between reference frames in the statistical modelling, but I do think these issues should be addressed in the discussion.

      It should be pointed out that all cues on the maze (visual, tactile, odorant) remained unchanged across trials, since the maze was rotated together with goal and guiding cues. Furthermore, the maze was equipped with an opaque cover to prevent mice from seeing the surrounding room (the imaging of mouse trajectories was achieved using infrared light and camera). It is however possible that some other cues such as room sounds and odors could be perceived and somewhat interfered with the sensory cues provided inside the maze. We have now mentioned this possibility in the discussion.

      (2) Somewhat related - I could not find how the internal maze cues are moved for each trial to demarcate the new goal (i.e. the luminous cues) ? This should be clarified in the methods.

      The luminous cues were fixed to the floor of the arena. Consequently, they rotated along with the arena as a unified unit, depicted in figure 1. We have added some clarifications in Figure 1 legend and methods.

      (3) It appears some data is being withheld from Figures 2&3? E.g. Days 3/4 from Fig 2b-f and Days 1-5 on for Fig 3. Similarly, Trials 2-7 are excluded from Fig 3. If this is the case, why? It should be clarified in the main text and Figure captions, preferably with equivalent plots presenting all the data in the supplement.

      The statistical distributions for all single days/trials are shown in the color-coded panels of Figure2&3. In the line plots of Figure2&3, we show only the overlay of 2-3 lines for the sake of clarity. The days/trials represented were chosen to capture the dynamic range of variability within the distributions. We have added this information in the figure legends.

      (4) I strongly believe the data and code should be made freely available rather than "upon reasonable request".

      Matrices of processed data and various codes for simulations and analyses are now available at https://github.com/ sebiroyerlab/Vestibule_sequences.

      Reviewer #3 (Public Review):

      Royer et al. present a fully automated variant of the Barnes maze to reduce experimenter interference and ensure consistency across trials and subjects. They train mice in this maze over several days and analyze the progression of mouse search strategies during the course of the training. By fitting models involving stochastic processes, they demonstrate that a model combined of the random, spatial, and serial processes can best account for the observed changes in mice's search patterns. Their findings suggest that across training days the spatial strategy (using local landmarks) was progressively employed, mostly at the expense of the random strategy, while the serial strategy (consecutive nearby vestibule check) is reinforced from the early stages of training. Finally, they discuss potential mechanistic underpinnings within brain systems that could explain such behavioral adaptation and flexibility.

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Weakness:

      (1) The development of the well-thought-out automated Barnes maze may attract the interest of researchers exploring spatial learning and memory. However, this aspect of the paper lacks significance due to insufficient coverage of the materials and methods required for readers to replicate the behavioral methodology for their own research inquiries.

      Moreover, as discussed by the authors, the methodology favors specialists who utilize wired recordings or manipulations (e.g. optogenetics) in awake, behaving rodents. However, it remains unclear how the current maze design, which involves trapping mice in start and goal positions and incorporating angled vestibules resulting in the addition of numerous corners, can be effectively adapted for animals with wired implants.

      The reviewer is correct in pointing out that the current maze design is not suitable for performing experiments with wired implant, particularly due to the maze’s enclosed structure and the access to the start/goal boxes through side holes. Instead, pharmacogenetics and wireless approaches for optogenetic and electrophysiology would need to be used. We have now mentioned this limitation in the discussion.

      (2) Novelty: In its current format, the main axis of the paper falls on the analysis of animal behavior and the development of behavioral modeling. In this respect, while it is interesting to see how thoughtfully designed models can explain the evolution of mice search strategy in a maze, the conclusions offer limited novel findings that align with the existing body of research and prior predictions.

      We agree with the reviewer that our study is weakly connected to previous researches on hippocampus and spatial navigation, as it consists mainly of animal behavior analysis and modeling and addresses a relatively unexplored topic. We hope that the combination of our behavioral approach with optogenetic and electrophysiology will allow in the future new insights that are in line with the existing body of research.

      (3) Scalability and accessibility: While the approach may be intriguing to experts who have an interest in or are familiar with the Barnes maze, its presentation seems to primarily target this specific audience. Therefore, there is a lack of clarity and discussion regarding the scalability of behavioral modeling to experiments involving other search strategies (such as sequence or episodic learning), other animal models, or the potential for translational applications. The scalability of the method would greatly benefit a broader scientific community. In line with this view, the paper's conclusions heavily rely on the development of new models using custom-made codes. Therefore, it would be advantageous to make these codes readily available, and if possible, provide access to the processed data as well. This could enhance comprehension and enable a larger audience to benefit from the methodology.

      The current approach might indeed extend to other species in equivalent environments and might also constitute a general proof of principle regarding the characterization of animal behaviors by the mixing of stochastic processes. We have now mentioned these points in the discussion.

      As suggest by the reviewer, we have now provided model/simulation codes and processed data to replicate the figures, at https://github.com/sebiroyerlab/Vestibule_sequences

      (4) Cross-validation of models: The authors have not implemented any measures to mitigate the risk of overfitting in their modeling. It would have been beneficial to include at least some form of cross-validation with stochastic models to address this concern. Additionally, the paper lacks the presence of analytics or measures that assess and compare the performance of the models.

      To avoid the risk of model overfitting, the most appropriate solution appeared to be repeating the simulations several times and examining the consistency of the obtained parameters across repetitions. For the mixture model, we now show in Supplementary figure 7 the probabilities obtained from 10 repetitions of the simulation. Similarly, for the Markov chain model, the probabilities obtained from 10 repetitions of the simulation are shown in Figure 6.

      Regarding model comparison, we have simplified our mixture model into only one model, as we realized the 2 other models in the previous manuscript version were simply special cases of the 3rd model. Nevertheless, comparison was still needed for the estimation for the best value of N (the number of consecutive segments that a strategy lasts) in the mixture model. We now show the comparison of mean square errors obtained for different values of N, using t-test across 10 repetitions of the simulations (Figure 5c).

      (5) Quantification of inter-animal variations in strategy development: It is important to investigate, and address the argument concerning the possibility that not all animals recruit and develop the three processes (random, spatial, and serial) in a similar manner over days of training. It would be valuable to quantify the transition in strategy across days for each individual mouse and analyze how the population average, reflecting data from individual mice, corresponds to these findings. Currently, there is a lack of such quantification and analysis in the paper.

      We have added a figure (Supplementary figure 8) showing the mixture model matching analyses for individual animals. A lot of variability is indeed observed across animals, with some animals displaying strong preferences for certain strategies compare to others. The average across mouse population showed a similar trend as the result obtained with the pooled data.

      Recommendations for the authors:

      Summary of Reviewer Comments:

      (1) In its present form, the manuscript lacks sufficient coverage of the materials and methods necessary for readers to replicate the behavioral methodology in their own research inquiries. For instance, it would be beneficial to clarify how the cues are rotated relative to the goal.

      (2) The models may be over-fitted, leading to spurious conclusions, and cross-validation is necessary to rule out this possibility.

      (3) The specific choice of the three strategies used to fit behavior in this model should be better justified, as other strategies may account for the observed behavior.

      (4) The study would benefit from an analysis of behavior on an animal-by-animal basis, potentially revealing individual differences in strategies.

      (5) Spatial behavior is not necessarily fully allocentric in this task, as only the two cues in the arena can be used for spatial orientation, unlike odor cues on the floor and sound cues in the room. This should be discussed.

      (6) Making the data and code fully open source would greatly strengthen the impact of this study.

      In addition, each reviewer has raised both major and minor concerns which should be addressed if possible.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Change "tainted" to "tinted" in Fig. 1a

      (2) Should note explicitly in Fig. 2d that the goal is at vestibule 0, and also in the legend

      (3) Fig. 3 legend should say "c-e)", not "c-f)"

      (4) Supplementary Fig. 8 legend repeats "d)" twice

      Reviewer #2 (Recommendations For The Authors):

      Packard & McGaugh 1996 is cited twice as refs 5 and 14

      Reviewer #3 (Recommendations For The Authors):

      - Figure 3: Please correct the labels referenced as "c-f)" in the figure's legend.

      - Rounding numbers issue on page 4: 82.62% + 17.37% equals 99.99%, not 100%.

      We fixed all minor points. We are very thankful to the reviewers for their constructive comments.